Closed elad661 closed 8 years ago
It seems that one of the forks of this library has a fix for this,
nathanhilbert/pygtfs_atx@0c52b07d98aaf9a677dc63e09544d5014e8dc549
@nathanhilbert I think it'd be useful if you could create a pull request with this fix (and your sqlalchemy warning fix) so other people will be able to make use of them as well
I'm always happy to contribute a PR when I can. I thought this solution would add too much time to loading https://github.com/nathanhilbert/pygtfs_atx/commit/0c52b07d98aaf9a677dc63e09544d5014e8dc549#diff-37d894406fb18d5a282b4ac0a09aee96R77. Are there any ideas for making it a little less brute force?
I thought about it for a bit and found a better solution:
it's possible to look at the header (the first line, with the column names) before reading everything and only filter if the header has a column of an unknown type. And since the names of the unknown fields are already known there's no need to loop over the entire dict for every row read from the csv.
I implemented it and it does seem a bit faster. I'll send a pull request.
Some transit operators add non-standard feed columns (which don't exist in the Google Extensions or in the official specs) to their feeds. for example MBTA (this feed: http://www.mbta.com/uploadedfiles/MBTA_GTFS.zip) adds "route_sort_order" to routes.txt
This causes pygtfs to fail, because it doesn't know what to do with these values.
It'd be better if these kind of errors will be ignored, so pygtfs will be usable even with not-exactly-standard feeds