juliangruber / browser-run

Run code inside a browser from the command line
447 stars 62 forks source link

Unicode characters are encoded incorrectly and/or cause parse errors – tape-run #4

Closed kamilogorek closed 10 years ago

kamilogorek commented 10 years ago

Some of the unicode characters are creating falsy-negatives when running using tape-run, but they're working correctly with browser-run. 2 different cases.

Using ächarcode: 228 – causes SyntaxError: Parse error and hangs process.

Using cyrillic eg. тест – encodes incorrectly giving тест as a result instead of тест which causes given test fail.

Simple tape test using Backbone:

test("routes with unicode", function (t) {
    location.replace('http://example.com#search/тест');
    Backbone.history.checkUrl();
    t.equal(router.query, "тест");
});

results:

  ---
    operator: equal
    expected: 'тест'
    actual:   'тест'
  ...

if you write simple test:

test("unicode", function (t) {
    t.equal("тест", "тест");
});

it will of course pass

  ---
    operator: equal
    expected: 'тест'
    expected: 'тест'
  ...
juliangruber commented 10 years ago

can you provide an example without any library that still fails? in this example it could as well be expected browser or backbone behavior

kamilogorek commented 10 years ago

I verified this issue using only node url parser and it seems that this is expected behavior. I guess I'll have to dig deeper into this now. Still not sure why there's a parse error though.

kamilogorek commented 10 years ago

Well, that're surprising results.

Test case:

var tape = require('tape');
var url = require('url');

tape('unicode routes', function (t) {
    var exampleUrl = 'http://example.com/тест#тест';
    var link = document.createElement('a');
    link.href = exampleUrl;

    console.log('Simple string: тест\n');

    console.log('Browser parsed pathname: ' + link.pathname.substr(1));
    console.log('Node parsed pathname: ' + url.parse(exampleUrl).pathname.substr(1) + '\n');

    console.log('Browser parsed hash: ' + link.hash.substr(1));
    console.log('Node parsed hash: ' + url.parse(exampleUrl).hash.substr(1));
});

Results:

Simple string: тест

Browser parsed pathname: тест
Node parsed pathname: тест

Browser parsed hash: %D1%82%D0%B5%D1%81%D1%82
Node parsed hash: тест

At least we're down to hash parsing only ;)

Or not.

Here're results straight from the browser console (works same if we simply embed link into the DOM):

Test case:

var foo = document.createElement('a');
foo.href = 'http://example.com/тест#тест';

console.log('Pathname: ' + foo.pathname);
console.log('Hash: ' + foo.hash);

Results:

Pathname: /%D1%82%D0%B5%D1%81%D1%82
Hash: #тест

Completely other way around :s

juliangruber commented 10 years ago

this is the same if you just run that code in your browser's dev console:

var str = 'тест';
str.substr(1); // "ест"

var link = document.createElement('a');
link.href = 'http://example.com/тест#тест';
link.pathname; // "/%D1%82%D0%B5%D1%81%D1%82"
link.hash; // "#тест"
wraithgar commented 10 years ago

This made the example work, I think this issue can be closed

test("routes with unicode", function (t) {
    location.replace('http://example.com#search/' + encodeURIComponent('тест'));
    Backbone.history.checkUrl();
    t.equal(router.query, "тест");
});

The syntax error was due to ä needing to be in quotes if used as an object index.

kamilogorek commented 10 years ago

Great catch @wraithgar! Thanks :)