transitland / transitland-datastore

Transitland v1 core components. Deprecated and only maintained occasionally. See Transitland v2.
https://transit.land/documentation/datastore/
MIT License
105 stars 18 forks source link

Error parsing GTFS CSV with incorrect quote escaping #511

Closed cesaregerbino closed 7 years ago

cesaregerbino commented 8 years ago

Hi!

I'm trying to load this feed url

but I'm obtaining this error

gtfs-error

It's quite strange because some days ago I've tried to load the same feed url (unfortunately I've not ended my first attempt for all the 4 steps .....), and the same feed url was processed.

Is there something going wrong in the data?

Do you 've any suggestions?

Thank you very much in advance!!

Cesare

irees commented 8 years ago

Hi @cesaregerbino

Thanks for trying out the Feed Registry and finding an issue!

I've looked at the feed, and the problem is a row in stops.txt that uses incorrect quote escaping.

50831016,222,"Fermata 222 - CASA CIRCONDARIALE","CASA CIRC. "LO RUSSO E CUTUGNO"",45.10143,7.61975,2,http://www.5t.torino.it/5t/it/trasporto/arrivi-ricerca.jsp?shortName=222&stoppingPointCtl:action:getTransits,0,

The fourth field has an internal quote, that according to the GTFS spec should be escaped using double quotes, ex.:

50831016,222,"Fermata 222 - CASA CIRCONDARIALE","CASA CIRC. ""LO RUSSO E CUTUGNO""",45.10143,7.61975,2,http://www.5t.torino.it/5t/it/trasporto/arrivi-ricerca.jsp?shortName=222&stoppingPointCtl:action:getTransits,0,

We've encountered an issue like this before. Some CSV parsers (such as Python's standard library CSV, used by Google's feed_validator.py) are very tolerant of quoting errors, and process the row. However, the CSV parser we are using (Ruby Rcsv, based on libcsv) is more strict and fails on this row. (Note: we are using the 'nostrict' option, but it then interprets it as a row with 11 fields instead of 10, which doesn't map correctly to the header row).

Ideally, the quoting error in the feed could be fixed by the provider.

irees commented 8 years ago

Since the underlying issue is in our GTFS library, I've created a ticket there: https://github.com/transitland/gtfs/issues/25

irees commented 7 years ago

Beyond scope to handle improperly quoted CSV files.