samthor / tiddlycsv

small, streaming CSV parser (~1kb)
Apache License 2.0
7 stars 0 forks source link

problem parsing us zips with embedded json column #1

Closed leeoniya closed 1 month ago

leeoniya commented 1 month ago

hey @samthor!

i wanted to take this thing for a spin, but ran into the same issue as i did with https://github.com/samthor/but-csv.

it looks like this free zipcodes file cannot be properly parsed by either lib.

sample:

"zip","lat","lng","city","state_id","state_name","zcta","parent_zcta","population","density","county_fips","county_name","county_weights","county_names_all","county_fips_all","imprecise","military","timezone"
"00601","18.18027","-66.75266","Adjuntas","PR","Puerto Rico","TRUE","","16834","100.9","72001","Adjuntas","{""72001"": 98.74, ""72141"": 1.26}","Adjuntas|Utuado","72001|72141","FALSE","FALSE","America/Puerto_Rico"
"00602","18.36075","-67.17541","Aguada","PR","Puerto Rico","TRUE","","37642","479.2","72003","Aguada","{""72003"": 100}","Aguada","72003","FALSE","FALSE","America/Puerto_Rico"
"00603","18.45744","-67.12225","Aguadilla","PR","Puerto Rico","TRUE","","49075","551.7","72005","Aguadilla","{""72005"": 99.76, ""72099"": 0.24}","Aguadilla|Moca","72005|72099","FALSE","FALSE","America/Puerto_Rico"
"00606","18.16585","-66.93716","Maricao","PR","Puerto Rico","TRUE","","5590","48.7","72093","Maricao","{""72093"": 82.27, ""72153"": 11.66, ""72121"": 6.06}","Maricao|Yauco|Sabana Grande","72093|72153|72121","FALSE","FALSE","America/Puerto_Rico"
"00610","18.2911","-67.12243","Anasco","PR","Puerto Rico","TRUE","","25542","265.7","72011","Añasco","{""72011"": 96.71, ""72099"": 2.82, ""72083"": 0.37, ""72003"": 0.1}","Añasco|Moca|Las Marías|Aguada","72011|72099|72083|72003","FALSE","FALSE","America/Puerto_Rico"
"00611","18.27698","-66.80688","Angeles","PR","Puerto Rico","TRUE","","1315","47.7","72141","Utuado","{""72141"": 100}","Utuado","72141","FALSE","FALSE","America/Puerto_Rico"
"00612","18.41283","-66.7051","Arecibo","PR","Puerto Rico","TRUE","","63312","321.1","72013","Arecibo","{""72013"": 98.94, ""72065"": 0.94, ""72017"": 0.11}","Arecibo|Hatillo|Barceloneta","72013|72065|72017","FALSE","FALSE","America/Puerto_Rico"
"00616","18.41878","-66.6679","Bajadero","PR","Puerto Rico","TRUE","","9625","341.4","72013","Arecibo","{""72013"": 100}","Arecibo","72013","FALSE","FALSE","America/Puerto_Rico"
"00617","18.44598","-66.56006","Barceloneta","PR","Puerto Rico","TRUE","","22573","474.8","72017","Barceloneta","{""72017"": 99.63, ""72054"": 0.37}","Barceloneta|Florida","72017|72054","FALSE","FALSE","America/Puerto_Rico"
"00622","17.98892","-67.1566","Boqueron","PR","Puerto Rico","TRUE","","7577","93.9","72023","Cabo Rojo","{""72023"": 100}","Cabo Rojo","72023","FALSE","FALSE","America/Puerto_Rico"
"00623","18.08429","-67.15336","Cabo Rojo","PR","Puerto Rico","TRUE","","39406","388.1","72023","Cabo Rojo","{""72023"": 100}","Cabo Rojo","72023","FALSE","FALSE","America/Puerto_Rico"
"00624","18.05905","-66.71932","Penuelas","PR","Puerto Rico","TRUE","","21648","178.0","72111","Peñuelas","{""72111"": 94.94, ""72113"": 5.06}","Peñuelas|Ponce","72111|72113","FALSE","FALSE","America/Puerto_Rico"
"00627","18.41905","-66.86037","Camuy","PR","Puerto Rico","TRUE","","32733","272.6","72027","Camuy","{""72027"": 100}","Camuy","72027","FALSE","FALSE","America/Puerto_Rico"
"00631","18.1852","-66.83169","Castaner","PR","Puerto Rico","TRUE","","1431","176.8","72081","Lares","{""72081"": 50.58, ""72001"": 49.42}","Lares|Adjuntas","72081|72001","FALSE","FALSE","America/Puerto_Rico"
"00637","18.081","-66.94659","Sabana Grande","PR","Puerto Rico","TRUE","","22882","251.2","72121","Sabana Grande","{""72121"": 93.98, ""72125"": 4.22, ""72079"": 1.8}","Sabana Grande|San Germán|Lajas","72121|72125|72079","FALSE","FALSE","America/Puerto_Rico"
"00638","18.28462","-66.5137","Ciales","PR","Puerto Rico","TRUE","","17605","95.2","72039","Ciales","{""72039"": 91.88, ""72107"": 7.48, ""72091"": 0.64}","Ciales|Orocovis|Manatí","72039|72107|72091","FALSE","FALSE","America/Puerto_Rico"
"00641","18.26742","-66.70212","Utuado","PR","Puerto Rico","TRUE","","24299","111.5","72141","Utuado","{""72141"": 100}","Utuado","72141","FALSE","FALSE","America/Puerto_Rico"
samthor commented 1 month ago

Thanks for your help! It looks like this is failing to close a quoted string because I look for "\n" or "," after the quotes conclude, and this file instead has "\r\n".

Will fix in both, thanks.

samthor commented 1 month ago

I've made it support CRLF now. I should just use uDSV ;-)

leeoniya commented 1 month ago

I should just use uDSV ;-)

haha, wait for the benchmarks first, in Bun.js this time! :P

leeoniya commented 1 month ago

pretty good :)

poor PapaParse chokes on quotes :(

Bun.js:

image

what's really weird is that tiddlycsv doesn't parse correctly using identical code in Node:

image

the code i use is really simple:

module.exports = {
    name: 'tiddlycsv',
    repo: 'https://github.com/samthor/tiddlycsv',
    load: async () => {
      const { parseCSV } = await import('tiddlycsv');

      return (csvStr, path) => new Promise(res => {
        let rows = parseCSV(csvStr);
        res(rows);
      });
    },
  };