nim-lang / Nim

Nim is a statically typed compiled systems programming language. It combines successful concepts from mature languages like Python, Ada and Modula. Its design focuses on efficiency, expressiveness, and elegance (in that order of priority).
https://nim-lang.org
Other
16.62k stars 1.47k forks source link

"surprising" Javascript UTF-8 cstring len behaviour #10911

Open zevv opened 5 years ago

zevv commented 5 years ago

What is the semantics of len() on unicode cstrings in javascript? Is the following behaviour expected?

let s1 = "♜♞♝♛♚♝♞♜"
const s2 = "♜♞♝♛♚♝♞♜"

echo "lit   string  ", "♜♞♝♛♚♝♞♜".len
echo "lit   cstring ", "♜♞♝♛♚♝♞♜".cstring.len
echo "let   string  ", s1.len
echo "let   cstring ", s1.cstring.len
echo "const string  ", s2.len
echo "const cstring ", s2.cstring.len

native:

lit   string  24
lit   cstring 24
let   string  24
let   cstring 24
const string  24
const cstring 24

Javascript:

lit   string  24
lit   cstring 24
let   string  24
let   cstring 8
const string  24
const cstring 24
Araq commented 5 years ago

cstring is mapped to JS strings and so len should return what JS's length returns.

zevv commented 5 years ago

Fair enough, but the behaviour is different for consts and literals then for variables - which might surprise some users, me included.

krux02 commented 5 years ago

I can confirm, this is a bug. Only the value of the let expression is correct.

var s1_245005 = makeNimstrLit("\xE2\x99\x9C\xE2\x99\x9E\xE2\x99\x9D\xE2\x99\x9B\xE2\x99\x9A\xE2\x99\x9D\xE2\x99\x9E\xE2\x99\x9C");
rawEcho(makeNimstrLit("lit   string  "), makeNimstrLit("24"));
rawEcho(makeNimstrLit("lit   cstring "), makeNimstrLit("24"));
rawEcho(makeNimstrLit("let   string  "), cstrToNimstr(((s1_245005 != null ? s1_245005.length : 0))+""));
rawEcho(makeNimstrLit("let   cstring "), cstrToNimstr(((toJSStr(s1_245005) != null ? toJSStr(s1_245005).length : 0))+""));
rawEcho(makeNimstrLit("const string  "), makeNimstrLit("24"));
rawEcho(makeNimstrLit("const cstring "), makeNimstrLit("24"));

In the other cases Nim thinks in can calculate the size at compile time when in fact it can't. When I tried to resolve this bug, I found out that in semMagic the magic to calculate the length of the cstring is mLengthArray, not mLengthStr as it should be, at least by the overloads that are in system.nim.