palewire / cummings.ee

A collection of the work of Edward Estlin Cummings, as it enters the public domain.
https://cummings.ee
MIT License
54 stars 30 forks source link

Unescaped quote characters in bulk downloads #307

Closed dwisdom0 closed 1 year ago

dwisdom0 commented 1 year ago

I noticed a small bug in the bulk downloads while idly browsing. I think it might be related to #88.

json.decoder.JSONDecodeError: Expecting ',' delimiter: line 8 column 85 (char 320)

The <a href=" tags have unescaped quote characters.

"source": "The materials here come from a first edition scanned at the <a href="https:// [. . .]`

This also affects the CSV bulk download.

pandas.errors.ParserError: Error tokenizing data. C error: Expected 8 fields in line 3, saw 10

Some of the fields have double quotes around them to escape a comma somewhere in the field. But the double quote in the <a> tag closes the first double quote, leaving the comma unescaped.

Love the site!

palewire commented 1 year ago

Thanks for pointing this out! I'll look to get it addressed.

palewire commented 1 year ago

I think that's fixed. Give it another try.