veliovgroup / jazeee-meteor-spiderable

Fork of Meteor Spiderable with longer timeout, caching, better server handling
https://atmospherejs.com/jazeee/spiderable-longer-timeout
33 stars 9 forks source link

Exception in callback of async function: SyntaxError: Unexpected token ^~H #17

Closed andycaramba closed 9 years ago

andycaramba commented 9 years ago

Hi there. I have this error in logs instead of phantomjs error message.

Exception in callback of async function: SyntaxError: Unexpected token ^~H
    at Object.parse (native) 
    at packages/jazeee:spiderable-longer-timeout/lib/server.coffee:169:22
    at packages/jazeee:spiderable-longer-timeout/lib/server.coffee:33:2
    at runWithEnvironment (packages/meteor/dynamics_nodejs.js:108:1)

My phantomjs version is

~$ phantomjs --version
1.9.0
jazeee commented 9 years ago

Out of curiosity, what platform are you on?

andycaramba commented 9 years ago

Ubuntu 14.04.3 LTS 64 bit and Meteor 1.1.0.3

jazeee commented 9 years ago

Looks like an issue with json parsing. PhantomJS seems to be a bit odd with what it outputs, and appears to put errors on stdout.

You can try version 1.2.2 of this package which does not parse JSON, and should work, albeit with fewer functions.

I use a very similar platform, and don't see these errors. I think we will need to add some debugging logs on parse errors so that we can find the issue.

andycaramba commented 9 years ago

Ok, i'll try to log raw phantomjs message and write results here.

jazeee commented 9 years ago

I pushed an update v1.2.7 which should log it.

Edit: Tested for normal cases which works. It should work to catch the exception. (I don't have exceptions, currently)

andycaramba commented 9 years ago

Oh, thanks. I will try to deploy my app with this new version and observe errors. It would be better to get rid of exception throwing in this block, IMHO. Just logging to console quite enough I think.

jazeee commented 9 years ago

Ok, Rearranged a tiny bit to ensure it goes down the failure path.

Edit: v1.2.8

andycaramba commented 9 years ago

I have caught the error.

Failed to parse PhantomJS output from:  {"status":301,"headers":[{"name":"Server","value":"nginx"},{"name":"Date","value":"Tue, 18 Aug 2015 22:42:38 GMT"},{"name":"Content-Type","value":"text/html"},{"name":"Content-Length","value":"178"},{"name":"Connection","value":"keep-alive"},{"name":"Location","value":"URL HERE"}],"content":"RIGHT BUILDED HTML PAGE HERE"}

I think html content may contain some invalid characters for json parsing. But that's not phantomjs error. Html content is right builded page. Only status 301 confused me. With original meteor package it returns status 200.

jazeee commented 9 years ago

Odd. I took that string and tried it on Chrome console. It is fully valid from what I can tell. I also tested the regular expression, which works as expected. On Chrome console, JSON.parse(...) works as expected, returning an object. I don't understand how it failed to parse or do the regex.

Also, the 301 seems quite odd. Which version of Spiderable did you use. We used to have an issue where the return code was 301, since it was based on all file requests, not just the main route. I wouldn't expect 301 with v1.2.6 or newer.

andycaramba commented 9 years ago

I was looked carefully at html content contained in this error and found this problem symbol which breaks json parsing - <U+2028>.

And I don't remember exactly Spiderable version but it was a version supplied with meteor 1.1.0.2 and 1.1.0.3 at least.

jazeee commented 9 years ago

Where is that character in reference to the JSON? Is it part of the content block? If so, can you paste a trimmed version of that content block to help debugging? Ideally, your test block should fail when running something like:

JSON.parse({"content": "text<U+2028>test"})
jazeee commented 9 years ago

Note, that character appears to be a line separator: http://www.fileformat.info/info/unicode/char/2028/index.htm

andycaramba commented 9 years ago

Yes, it's part of content block. And when i trying to run JSON.parse({"content": "text<U+2028>test"}) in chrome js console i get an error

Uncaught SyntaxError: Unexpected token o(…)
jazeee commented 9 years ago

Sorry, I missed a quote. It should be:

JSON.parse('{"content": "text<U+2028>test"}')
jazeee commented 9 years ago

I am looking to find out how to insert your bad character in a string.

andycaramba commented 9 years ago

I think this SO answer is explain the error - http://stackoverflow.com/a/9168133/585470

andycaramba commented 9 years ago

JSON.parse('{"content": "text<U+2028>test"}') yes it works. I also overlooked the missing quotes.

andycaramba commented 9 years ago

JSON.parse('{"content": "text\u-2028test"}') this code throws an exception

jazeee commented 9 years ago

\u format is \u2028. You can test by: test='{"content":"\u2010"}' The odd thing is this still works:

JSON.parse('{"content":"a\u2028b"}').content.length
andycaramba commented 9 years ago

Oh, sorry. My mistake. Yes, it works with \u2028

jazeee commented 9 years ago

I believe we can remove the problem characters using something like:

test='\u2028';
test.length; // is 1.
test.replace(/\u2028/, '').length; // Is 0
jazeee commented 9 years ago

Probably best on the phantomJS side, to replace all HTML content with \u2028 and \u2029 with spaces? Would that work for your content?

andycaramba commented 9 years ago

I think it would. But the question remains why phantomjs considers it an error. And confusing 301.

jazeee commented 9 years ago

I believe the error is occurring on the Meteor side (in node). Phantom executes successfully, and returns JSON, as shown in your logs. What fails is that JSON.parse. I am not sure it is caused by \u2028, though, since JSON.parse seems to work. For example, our tests still show that this works: JSON.parse('{"content":"a\u2028b"}').content.length

Perhaps there is a different character in your content that is failing. The original error says "^H", which may be a backspace, like http://www.fileformat.info/info/unicode/char/0008/index.htm

jazeee commented 9 years ago

Ahh. It is not an error on Node, or on Chrome console. It is an error when running on PhantomJS:

$ phantomjs 
phantomjs>  JSON.parse('{"content":"a\u2028b"}')
Parse error
phantomjs>  JSON.parse('{"content":"a"}')
{
   "content": "a"
}

Whereas, on Node:

$ node
>  JSON.parse('{"content":"a\u2028b"}')
{ content: 'a
b' }
> 
andycaramba commented 9 years ago

The first error text I've copypasted here was from multitail tool and it seems this tool such interprets this symbol. I've just found problem string in multitail output and it is SOME LONG-LONG TEXT ^~H\\n\\n

Next output I've copypasted was from less tool and it shows this symbol more clear.

jazeee commented 9 years ago

I also tested the same using the node version that Meteor uses, and had no issues. Only PhantomJS executable seems to have the problem. Here is the issue, however. I cannot create the regex in PhantomJS. This fails:

phantomjs> test.replace(/\u2028/g,' ')
Parse error
jazeee commented 9 years ago

Somehow, PhantomJS cannot handle the special character at all. In your test, do you know what \u code that &~H character is? How can we verify that character is 2028?

andycaramba commented 9 years ago

I'm not exactly sure in this but I see this unicode combination with "less" tool in my server remote terminal.

andycaramba commented 9 years ago

I will try to fix it on my side, just making sure that those characters are no longer present on the pages and see what happens.

andycaramba commented 9 years ago

I apologize for such a long answer. I've got rid from this character in html pages and the problem was gone. Thanks for your help and such useful plugin.