Open PyryM opened 2 years ago
this is definitely an Emscripten bug (note that file names with colons, which are perfectly legal on linux, would trigger this bug as well!), and if terra is to be tweaked to add a workaround, it should be optional imo. i would suggest either a terralib.saveobj
flag/environment variable to use hashes in the generated anonymous names, or a way to customize the format (e.g. you pass a function that takes a terra function object and returns an appropriate name)
Can we at least check with the Emscripten developers to see what their outlook is on this one? Since a workaround is available on our end, I don't think we need to rush the fix.
Yes, there's an existing issue: https://github.com/emscripten-core/emscripten/issues/15325
If you compile the .bc
file to an object file (emcc -c hello_world.bc -o hello_world.o
) does it still contain that non-standard symbols?
Are these non-standard symbols only ever internal/local symbols? (i.e. they always have lower case tags when output by nm
)?
TLDR: Terra gives anonymous functions names like
$anon (junk/wasm_helloworld.t:7)
containing special characters (specifically the:
) that break the way emscripten expects to parse symbol names.To reproduce: First (with Terra on llvm10+ and Emscripten installed) compile to wasm32 bitcode:
Now try to link with emscripten:
Why? Emscripten gets symbol names by calling
llvm-nm --print-file-names helloworld.bc
and parsing each line using colons as delimiters:But terra has produced this:
Where emscripten incorrectly splits the line
helloworld.bc: -------- t $anon (junk/wasm_helloworld.t:7)
because it finds the colon inside the symbol name.Workaround: It's possible to avoid the issue by making sure every terra function is named, using
func:setname(...)
as needed.Fix?: Arguably this is Emscripten's fault for trying to parse human-readable tool output rather than using actual structured APIs, and for not even robustly parsing that output.
It might make sense, though, on the Terra side to give anonymous functions more sanitized names (i.e., without spaces, colons, or parenthesis) because there are likely a number of tools that expect symbol names in bitcode to be limited to C/C++ naming rules.