gregrahn / join-order-benchmark

Join Order Benchmark (JOB)
293 stars 85 forks source link

Malformed CSVs #10

Open chsalgado opened 3 years ago

chsalgado commented 3 years ago

Downloaded CSV tarball. Trying to upload to SQL Azure using bcp proved to be really hard as CSVs are malformed.

Sample CSV row in aka_name.csv 220222,538021,"\"Borolas\", Joaquín García Vargas",,B6425,J2526,B642,6526774f1ce04414f56476409ce59060

CSV expects quotation marks to be escaped as "", not \" 220222,538021,"""Borolas"", Joaquín García Vargas",,B6425,J2526,B642,6526774f1ce04414f56476409ce59060

Bouncner commented 3 years ago

Hey @chsalgado, we had CSV problems as well (see #11), but the mentioned row looks fine to me:

$ grep '^220222' aka_name.csv
220222,538021,"\"Borolas\", Joaquín García Vargas",,B6425,J2526,B642,6526774f1ce04414f56476409ce59060

Maybe your terminal does not show the escape character? It's still cumbersome as most software expects quotes to be escaped as "", but the given files should be importable to most systems if you set the escape character correctly.

In case you cannot change the escape symbol, this (rather hacky) command might help you (not guarantees): for csv_file in *.csv; do echo $csv_file; sed -i'' -e 's/\\\\\"/MARKER1/g;s/\\\\"/MARKER2/g;s/\\"/""/g;s/MARKER1/\\\\""/g;s/MARKER2/\\\\"/g' $csv_file; done