Closed kylerchin closed 10 months ago
Nice finding. I’ll have to discuss a bit with people from https://github.com/etalab/transport-validator as such a bad file should yield an error on the validator, but still be easy to parse. I’m not sure yet what would be the best way to keep that information (and if I don’t have any suggestion rapidly, then let’s take it as it is: it’s still a nice improvement).
Could you just make the metra gtfs a bit small (maybe trim the stops_times.txt filie) ?
Sure! Let me take a look!
Thank you for the change. We still need the file, just with a few lines
Just a quick question. Would this change be O(n)? It's much simpler to check the separator, then invoke this change on an if statement. This would save about 2 billion steps if I am ingesting the World's GTFS data.
The function read_objs
is called once per csv_file, so a dozen of times per GTFS file. The trimming only occurs for the headers, not for each line. I think this will scale without any trouble.
(and even if we decided to trim each line, what might happen eventually, removing trailing white spaces is still quite cheap compared to parsing numbers)
@Tristramg thoughts so far?
Looks like UK is now working with this now!
Metra https://metra.com/metra-gtfs-api would not load and I was getting complaints about that. The header from stops.txt looks like stop_id, stop_name, stop_desc, stop_lat, stop_lon, zone_id, stop_url, wheelchair_boarding and thus
stop_name
is not eq tostop_name
.This resolves this issue by trimming the header column names for every file.
Metra Rail is added as a test to ensure this is working.
The patch is bumped up a version to
0.39.1
.