udem-dlteam / pnut

🥜 A Self-Compiling C Transpiler Targeting Human-Readable POSIX Shell
https://pnut.sh
BSD 2-Clause "Simplified" License
425 stars 14 forks source link

Deduplicate strings passed to defstr makes pnut slower #76

Closed laurenthuberdeau closed 1 month ago

laurenthuberdeau commented 3 months ago

Context

In the shell backend, the variables passed to defstr are allocated sequentially, even when the same string is used multiple times.

The laurent/deduplicate_defstr_strings branch implements sharing of string variables for identical strings. This requires interning strings like we do for identifiers (using the same table) which slows down tokenizing of identifiers and strings. This is because there are more conflicting entries in the hash table which results in linear probing. This results in a slower bootstrap for a minor benefit in code quality (and even then it's debatable since it can make it harder to associate string and string variables and moves pnut away from being single pass).

There seems to be a few options:

This is low priority, so creating a ticket to dump the progress on this problem.