scrapinghub / splash

Lightweight, scriptable browser as a service with an HTTP API
BSD 3-Clause "New" or "Revised" License
4.1k stars 513 forks source link

console.log fails on non-ASCII characters #175

Open mehaase opened 9 years ago

mehaase commented 9 years ago

Splash (@de9374) raises UnicodeEncodeError if you run a JavaScript that passes a non-ASCII string to console.log(…). Here's an example curl request that triggers this error:

/home/mhaase $ curl -D - -d '{"url":"http://google.com", "lua_source": "function main(splash)\n    splash:autoload(splash.args.js_source)\n    splash:runjs(\"logUtf8()\")\n    return {ok=\"ok\"}\nend\n", "js_source": "function logUtf8() {console.log(\"你好\")}"}' -H 'content-type:application/json' 'http://192.168.31.1:8050/execute'
HTTP/1.1 200 OK
Transfer-Encoding: chunked
Date: Wed, 04 Feb 2015 18:19:10 GMT
Content-Type: application/json
Server: TwistedWeb/14.0.2

{"ok": "ok"}

Note that the request completes successfully, but the call to console.log() does not complete successfully. The Splash log (using -v 2 verbosity) shows:

2015-02-04 13:19:25.774045 [render] <unicode instance at 0x7fb10d27a060 with str error:
         Traceback (most recent call last):
          File "/usr/local/lib/python2.7/dist-packages/twisted/python/reflect.py", line 391, in _safeFormat
            return formatter(o)
        UnicodeEncodeError: 'ascii' codec can't encode characters in position 15-16: ordinal not in range(128)
        >

I know the log is supposed to be ASCII safe, but this can cause problems when trying to log information about content on a page (e.g. logging alt text for an image) that is outside the control of the program author. Maybe we could use \u style encoding like JSON does? E.g. log \u4f60\u597d instead of 你好.

kmike commented 9 years ago

JS console.log() should handle unicode, it looks like a Splash bug.