pelias / geonames

Import pipeline for geonames in to Pelias
https://pelias.io
MIT License
45 stars 37 forks source link

improved handling of CSV comments #405

Closed missinglink closed 2 years ago

missinglink commented 2 years ago

as mentioned in https://github.com/pelias/geonames/issues/404#issuecomment-1055506176 there seems to be a weird bug with how the geonames metatadata files are encoding comments, (I think!)

using this custom comment handler stream we're able to work around the issue, although I'm still not clear why the comment option from https://csv.js.org/parse/options/ (and sed '/^#/d') doesn't do the same thing 🤷

I've also taken the opportunity to do some simple housekeeping tasks:

The 'actual work' here is:

resolves https://github.com/pelias/geonames/issues/404

missinglink commented 2 years ago

agh woops, so the xsv failure was my fault since I wan't explicitly telling it the file was TSV instead of CSV:

curl -s http://download.geonames.org/export/dump/countryInfo.txt | sed '/^#/d' | xsv cat -d '\t' rows

I suspect there's just a weird bug in csv-parse

missinglink commented 2 years ago

opened an issue upstream https://github.com/adaltas/node-csv/issues/325 hopefully we can remove these commits if a solution can be found within that lib natively.