internetarchive / fatcat

Perpetual Access To The Scholarly Record
https://guide.fatcat.wiki
Other
116 stars 18 forks source link

editgroup diff view fails on some short strings in refs, via TOML #97

Open bnewbold opened 2 years ago

bnewbold commented 2 years ago

Trying to view the "diff" on https://fatcat.wiki/editgroup/wdtofwwopvezbm2hsdt4kr3etm results in a 500 error because TOML generation fails on release_rpoh4fe2lnhprj74yrn4pac7xa.

The root cause seems to be trying to TOML-ify the references, which include stub metadata like {'unstructured': '\xa0'}.

To reproduce in python3, with the toml library installed:

import toml, json

toml.dumps(json.loads('{"unstructured": "\xa0"}'))

The resulting stack trace is like:

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-24-49ff71ee924f> in <module>
----> 1 toml.dumps(json.loads('{"unstructured": "\xa0"}')
      2 
      3 )

/usr/lib/python3/dist-packages/toml/encoder.py in dumps(o, encoder)
     58     if encoder is None:
     59         encoder = TomlEncoder(o.__class__)
---> 60     addtoretval, sections = encoder.dump_sections(o, "")
     61     retval += addtoretval
     62     outer_objs = [id(o)]

/usr/lib/python3/dist-packages/toml/encoder.py in dump_sections(self, o, sup)
    224                     if o[section] is not None:
    225                         retstr += (qsection + " = " +
--> 226                                    unicode(self.dump_value(o[section])) + '\n')
    227             elif self.preserve and isinstance(o[section], InlineTableDict):
    228                 retstr += (qsection + " = " +

/usr/lib/python3/dist-packages/toml/encoder.py in dump_value(self, v)
    178             dump_fn = self.dump_funcs[list]
    179         # Evaluate function (if it exists) else return v
--> 180         return dump_fn(v) if dump_fn is not None else self.dump_funcs[str](v)
    181 
    182     def dump_sections(self, o, sup):

/usr/lib/python3/dist-packages/toml/encoder.py in _dump_str(v)
    111         else:
    112             joiner = "u00"
--> 113         v = [v[0] + joiner + v[1]] + v[2:]
    114     return unicode('"' + v[0] + '"')
    115 

IndexError: list index out of range

The metadata seems bad, but it also feels like the python TOML library should not choke on this. The rust TOML library (used in fatcat-cli) outputs something like:

[[refs]]
index = 0

[refs.extra]
unstructured = " "