orangeduck / BuildYourOwnLisp

Learn C and build your own programming language in under 1000 lines of code!
http://www.buildyourownlisp.com/
Other
2.9k stars 394 forks source link

string escaping/unescaping #47

Closed radiofreejohn closed 10 years ago

radiofreejohn commented 10 years ago

I am probably doing something stupid, but I noticed that doing something like:

lispy> "hello\0world" "helloworld"

Why isn't this reading "hello\0world" like other escaped strings? Poking at the mpc code, it looks like this should be dealt with.

Separately, the string "\" doesn't work and things like "hello\Qnothing" end up "hello\Qnothing"

I'm assuming that the intent here is to store and print strings literally as they're entered (subverting things that may make printf format the string). If this is the case, then the above are issues, if not, then I'm just confused a bit :)

orangeduck commented 10 years ago

Hey,

I'll check out the example with "hello\0world" in greater depth. It isn't clear how to deal with this properly. This is because mpc stores strings just as C strings. C strings can't have \0, which is the null terminal character, in the middle of them because it is unclear if this is the end of the string or just another character. This might not be possible to fix without switching to a proper string handling library. At the moment mpc just strips null characters from the string, which I think is okay.

The string "\" isn't a complete string. It consists of the opening quote mark ", then an escaped quote mark \" but there is no terminating quote mark! So this will probably give a syntax error if you try to use it. The string "\"" is the string consisting only of a single quote mark which is maybe the one you're looking for.

The escape characters in the strings are limited to those which are in C http://en.wikipedia.org/wiki/Escape_sequences_in_C This is why escape sequences such as \Q don't actually escape to anything and just remain the same.

String escaping is pretty complicated and messy. I can see why you might be confused! Unfortunately it isn't just a case of showing strings as they are entered, and there is subtle stuff going on below the surface. It took me a long time to get the mpc string escape/unescape functions to the state they are in now. I think mpc is too lightweight to do it properly, but works in most cases, for example a project like this.

radiofreejohn commented 10 years ago

Yep, I guess I was just wondering why we bother escaping them at all here. If the goal is to have the repl output include the special characters (\n, etc...) that makes sense.