Open simonw opened 2 years ago
Still a bug against latest Datasette Desktop release.
Here's how that CSV file starts:
And in Datasette the data cuts off here:
Which is right where the first double-newline paragraph break in that CS file occurs.
This is a datasette-app-support
problem, moving the issue there.
I'm suspicious of this code: https://github.com/simonw/datasette-app-support/blob/d130884bee3db2b170c661340ca250d8b95d2cfc/datasette_app_support/utils.py#L69-L71
Maybe that AsyncDictReader(response.aiter_lines())
pattern can't cope with CSV files that include their own double-quoted newlines?
Here are my notes from when I wrote that AsyncDictReader
class: https://github.com/simonw/datasette-app-support/issues/14#issuecomment-917693618
Maybe AsyncDictReader.__anext__()
needs to be smart enough to watch out for unbalanced double quotes and consume another line if it spots one?
https://github.com/MKuranowski/aiocsv may be able to handle this for me.
aiocsv
is designed to work with a aiofiles
object with a .read()
coroutine - I'm not sure how best to map that to an httpx
streaming response.
I'm beginning to think it would be better for the app to either suck the entire CSV file into memory OR to save it to a temporary file on disk, then read it into a table. Much simpler that way - this problem with newlines has made me very suspicious of importers that don't directly use csv
as it was intended to be used.
I'm going to go with the memory option. Datasette Desktop runs on Macs with a decent amount of RAM, and with swap.
https://raw.githubusercontent.com/okfn/dataportals.org/master/data/portals.csv
The "Open CSV from URL..." menu option only produced 13 rows - but using
sqlite-utils insert portals.db portals portals.csv --csv
on the command-line got all 596.