rust-transit / gtfs-structure

Read a GTFS file
MIT License
61 stars 32 forks source link

Trim header column names, added Metra test #152

Closed kylerchin closed 10 months ago

kylerchin commented 10 months ago

Metra https://metra.com/metra-gtfs-api would not load and I was getting complaints about that. The header from stops.txt looks like stop_id, stop_name, stop_desc, stop_lat, stop_lon, zone_id, stop_url, wheelchair_boarding and thus stop_name is not eq to stop_name.

This resolves this issue by trimming the header column names for every file.

Metra Rail is added as a test to ensure this is working.

The patch is bumped up a version to 0.39.1.

Tristramg commented 10 months ago

Nice finding. I’ll have to discuss a bit with people from https://github.com/etalab/transport-validator as such a bad file should yield an error on the validator, but still be easy to parse. I’m not sure yet what would be the best way to keep that information (and if I don’t have any suggestion rapidly, then let’s take it as it is: it’s still a nice improvement).

Could you just make the metra gtfs a bit small (maybe trim the stops_times.txt filie) ?

kylerchin commented 10 months ago

Sure! Let me take a look!

Tristramg commented 10 months ago

Thank you for the change. We still need the file, just with a few lines

lolpro11 commented 10 months ago

Just a quick question. Would this change be O(n)? It's much simpler to check the separator, then invoke this change on an if statement. This would save about 2 billion steps if I am ingesting the World's GTFS data.

Tristramg commented 10 months ago

The function read_objs is called once per csv_file, so a dozen of times per GTFS file. The trimming only occurs for the headers, not for each line. I think this will scale without any trouble.

(and even if we decided to trim each line, what might happen eventually, removing trailing white spaces is still quite cheap compared to parsing numbers)

kylerchin commented 10 months ago

@Tristramg thoughts so far?

kylerchin commented 10 months ago

Looks like UK is now working with this now!