qri-io / qri

you're invited to a data party!
https://qri.io
GNU General Public License v3.0
1.11k stars 66 forks source link

unsupported Unicode escape sequence when publishing file #1488

Open chriswhong opened 4 years ago

chriswhong commented 4 years ago

What is your OS and version?

MacOS 10.15.5 (Catalina)

What version of qri are you using (qri version)?

0.9.10-dev

Issue

After creating a new dataset from a csv (which turned out to not actually be a csv), publishing to cloud gave the error 500 biz.CreateVersion create preview error: pq: unsupported Unicode escape sequence

What did you do?

I downloaded PLUTO 20v5 from NYC's Department of City Planning: https://www1.nyc.gov/site/planning/data-maps/open-data/dwn-pluto-mappluto.page, unzipped it and then ran qri save on the csv. It saved successfully, but I got the error shown above when publishing.

What happened?

Turns out the CSV is corrupt, and is not even a UTF8 text file.

What did you expect to happen?

Qri should notice that it's not getting a CSV or JSON file and should have caught this during the save operation.

dustmop commented 4 years ago

Ah, from the original description I thought this csv file was some non-utf8 encoded text file. But taking a closer look, it looks like some sort of binary format. It appears to be some kind of zip, but I can't get it to successfully unzip. My guess is that locally, we treat it as a single line (maybe single element), but cloud tries to parse it with utf8 encoding and fails.