node-geojson / gtfs2geojson

Convert GTFS data into GeoJSON.
ISC License
25 stars 8 forks source link

large files didn't seem to work so I made some changes... #4

Closed andrewharvey closed 8 years ago

andrewharvey commented 8 years ago

@tmcw Thanks for releasing this. I've made some changes at https://github.com/andrewharvey/gtfs2geojson to get it working better for my specific needs:

It's an architectural change so I didn't open a pull request, just thought I would open a ticket to let you know what I've done and in case others also have similar requirements as me that led to these changes, feel free to close this.

tmcw commented 8 years ago

@andrewharvey this all looks 👍 for a PR, if you want to make one.

Could you explain a little more about

Make assumptions about line order in CSV and use alternate approach to reading the sequential line data

Given the GTFS spec, are these safe assumptions - they'll work with all valid GTFS data?

andrewharvey commented 8 years ago

Given the GTFS spec, are these safe assumptions - they'll work with all valid GTFS data?

That's true, the GTFS spec makes no assumptions about the order. I've introduced an option to assume order (default is not) as with my test case I observed a 26% reduction in processing time (123s to 91s) and a 61% reduction in memory usage (1.76G to 0.68G) when I assume ordered.

andrewharvey commented 8 years ago

Okay, I've run into a problem that the tests just hang that I can't work out what's the problem. I updated the tests since my changes forced results to be passed via a callback.

tmcw commented 8 years ago

Is the GTFS file you're wrangling public? This could be a fun opportunity to experiment with a Rust port for maximum speed.

andrewharvey commented 8 years ago

Is the GTFS file you're wrangling public? This could be a fun opportunity to experiment with a Rust port for maximum speed.

Kind of, it's CC BY 4.0 licensed but you need to create an account at opendata.transport.nsw.gov.au and get an API key in order to download the data. Or I've put a static extract of it at https://www.alantgeo.com.au/share/transport.nsw.gov.au_publictransport_timetables.zip (beware of the BOM in the txt files, it caused me a lot of troubles)

It doesn't need to be any faster, it only takes about 120-90 seconds and this is only a weekly or so static export (not the realtime GTFS protobufs).

It was mainly working around the maximum size of a string in nodejs.

tmcw commented 8 years ago

Apparently several other people have had the same idea as me :) https://github.com/BurntSushi/rust-csv/issues/33

tmcw commented 8 years ago

👍 Cool, checked out your code. Happy to accept a PR if you want to merge up the changes, otherwise, nice work - added a see also reference to the readme to your fork.