mapbox / csv2geojson

magically convert csv files to geojson files
http://mapbox.github.io/csv2geojson/
MIT License
362 stars 82 forks source link

lat/lon field regex finder are too broad #46

Closed yohanboniface closed 7 years ago

yohanboniface commented 7 years ago

For example, they will match this field as latitude: Isolation des murs et planchers bas, even if there is a latitude header some column later.

I was about to work on a PR, but I'd prefer have your opinion on the best option before:

  1. change regex to only match the beginning of the header (including spaces) /^( )*(Lat)(itude)?/gi
  2. loop over all headers and only retain the longest match (that would need switch for isLat/isLon to matchLat/matchLon that would return the matched string; given that isLat/isLon are exported from the module, this may cause trouble)
  3. allow to override the regex from the options

My preference would be the option 1, even if it's a breaking change, because I feel matching lat or lon in the middle of the the header name does not really make sense (but I may be missing something). Option 2 is nice, but we should decide what to do for isLat/isLon: keep them even if we don't use them anymore? Or break the API?

Thoughts?

tmcw commented 7 years ago

My preference would be the option 1, even if it's a breaking change, because I feel matching lat or lon in the middle of the the header name does not really make sense (but I may be missing something).

This behavior is mostly because some people have headers like

point latitude,point longitude,title

Which seems kind of ludicrous, but so does storing geographic data in a CSV file :) You can override the names entirely if you'd like to.

I'd be okay with (2) - something like match length / string length might give better results.

yohanboniface commented 7 years ago

I'd be okay with (2) - something like match length / string length might give better results.

Okay. What about isLat/isLon should we keep them even unused?

yohanboniface commented 7 years ago

Wanted to have a look now, but I'm in doubt: I see there is an index.js (which is the one unitt tested, and I guess NodeJS entry point) and a csv2geojson.js (which is the one I use from the browser). package.json seems to say that csv2geojson.js is a browserified version of the first, but I see some commits only in csv2geojson.js (including my latest one, but not only), and if I have a closer look it seems that both are not totally in sync. So I'm unsure what's going on and where I should hack :)

tmcw commented 7 years ago

https://github.com/mapbox/csv2geojson/commit/28b581e19f17c2bf003fa792e71f87b91fb09425 was already fixed in master, I assumed it was the result of npm run build being rebuilt.

I've removed csv2geojson.js from the repo and am going for the npmcdn approach, which will rebuild & CDN-ify a distribution on every new npm version. Also enabled eslint for consistency. So, index.js is the only target.