Closed andrewharvey closed 8 years ago
@andrewharvey this all looks 👍 for a PR, if you want to make one.
Could you explain a little more about
Make assumptions about line order in CSV and use alternate approach to reading the sequential line data
Given the GTFS spec, are these safe assumptions - they'll work with all valid GTFS data?
Given the GTFS spec, are these safe assumptions - they'll work with all valid GTFS data?
That's true, the GTFS spec makes no assumptions about the order. I've introduced an option to assume order (default is not) as with my test case I observed a 26% reduction in processing time (123s to 91s) and a 61% reduction in memory usage (1.76G to 0.68G) when I assume ordered.
Okay, I've run into a problem that the tests just hang that I can't work out what's the problem. I updated the tests since my changes forced results to be passed via a callback.
Is the GTFS file you're wrangling public? This could be a fun opportunity to experiment with a Rust port for maximum speed.
Is the GTFS file you're wrangling public? This could be a fun opportunity to experiment with a Rust port for maximum speed.
Kind of, it's CC BY 4.0 licensed but you need to create an account at opendata.transport.nsw.gov.au and get an API key in order to download the data. Or I've put a static extract of it at https://www.alantgeo.com.au/share/transport.nsw.gov.au_publictransport_timetables.zip (beware of the BOM in the txt files, it caused me a lot of troubles)
It doesn't need to be any faster, it only takes about 120-90 seconds and this is only a weekly or so static export (not the realtime GTFS protobufs).
It was mainly working around the maximum size of a string in nodejs.
Apparently several other people have had the same idea as me :) https://github.com/BurntSushi/rust-csv/issues/33
👍 Cool, checked out your code. Happy to accept a PR if you want to merge up the changes, otherwise, nice work - added a see also reference to the readme to your fork.
@tmcw Thanks for releasing this. I've made some changes at https://github.com/andrewharvey/gtfs2geojson to get it working better for my specific needs:
It's an architectural change so I didn't open a pull request, just thought I would open a ticket to let you know what I've done and in case others also have similar requirements as me that led to these changes, feel free to close this.