uiri / toml

Python lib for TOML
MIT License
1.08k stars 190 forks source link

Backslash roundtrip problem #404

Open jeremysanders opened 2 years ago

jeremysanders commented 2 years ago

I'm having problems with Windows paths stored in toml files:

In [1]: import toml
In [2]: foo = {'a': 'C:\\hostedtoolcache\\windows\\Python\\3.9.13\\x64\\Lib\\site-packages\\PyQt5\\bindings'}
In [3]: d = toml.dumps(foo)
In [4]: d
Out[4]: 'a = "C:\\hostedtoolcache\\windows\\Python\\3.9.13\\x64\\\\Lib\\\\site-packages\\\\PyQt5\\\\bindings"\n'
In [5]: toml.loads(d)
/usr/lib/python3/dist-packages/toml/decoder.py in loads(s, _dict, decoder)
    512                                         multibackslash)
    513             except ValueError as err:
--> 514                 raise TomlDecodeError(str(err), original, pos)
    515             if ret is not None:
    516                 multikey, multilinestr, multibackslash = ret

TomlDecodeError: Reserved escape sequence used (line 1 column 1 char 0)

It looks like only some backslashes are escaped properly by dumps. I tested this with toml from github.

jeremysanders commented 2 years ago

Ok, I think I've narrowed this down to the presence of \x in the string:

In [24]: toml.dumps({'a': r'\x43'})
Out[24]: 'a = "\\u0043"\n'

https://github.com/uiri/toml/blob/59d83d0d51a976f11a74991fa7d220fc630d8bae/toml/encoder.py#L98 is wrong, as it splits on \x, but does not ignore \\x.

jeremysanders commented 2 years ago

I've created a pull request. However, I notice there are problems with strings like '\x02' which don't seem to work, which my pull request doesn't address.

davidfokkema commented 2 years ago

Got bitten by this just now. I have a user whose name starts with an 'x' and saving their home directory path into a config file breaks my app. Not fun.

davidfokkema commented 2 years ago

I'm switching to tomli (included in the standard library of version 3.11) in combination with tomli_w.

dimakuv commented 1 year ago

We were also bitten by this:

>>> toml.dumps({'A': '\\x2d'})
'A = "\\u002d"\n'

As was already pointed out, this code is at fault: https://github.com/uiri/toml/blob/59d83d0d51a976f11a74991fa7d220fc630d8bae/toml/encoder.py#L99-L113

The code is extremely complicated and must be untangled in order to fix this bug. We didn't attempt it; instead we're planning on switching to tomli.