Closed terlar closed 4 years ago
The issue is that the regexes don't match on unicode characters - at least not the more specific ones like matchCharacter
(https://github.com/talyz/fromElisp/blob/master/default.nix#L53) which should be able to match single unicode characters, but isn't. Unicode should be fine in places where the match is generic enough that all bytes of a character are matched, though.
I see, because my config do have unicode chars in a few places. As I understood it you couldn't use this if any unciode chars were present. I wonder if I could still use this, but I guess I just have to try it.
Okay, so I did some tests and I got it to work, so it seems it is working. The only issue is when you use unicode characters together with the char specifier ?
. E.g. ?λ
. When I either put the numeric representation of the char instead it worked, or wrapped the char in a string.
But perhaps it is possible to fix the parser to work with ?X
unicode chars, or would that make things too tricky?
I don't think it's possible to make it work, no. The relevant part of the regex matches "any character but ]
, [
, \
(
, or )
" and then looks for a delimiter that is not part of the token. Strings and comments work fine since they match everything until they hit a delimiter - "
for strings and \n
for comments.
Okay, makes sense, I guess that is fine enough trade-off. Thank you for the explanation!
I see that you mention that nix doesn’t have Unicode support, but as I understand it, it could just pass those bytes through to wherever. What kind of issues will be caused by Unicode characters within the parsed files?
If this is not the case, perhaps it is a valid use case and it can be raised here: https://github.com/NixOS/nix/issues/770