itinero / GTFS

.NET implementation of a General Transit Feed Specification (GTFS) feed parser.
http://www.itinero.tech
MIT License
69 stars 44 forks source link

Strings encased in quotes are not properly unescaped/trimmed #79

Open domints opened 1 month ago

domints commented 1 month ago

Example feed: https://otwartedane.metropoliagzm.pl/dataset/rozklady-jazdy-i-lokalizacja-przystankow-gtfs-wersja-rozszerzona/resource/290298ce-944b-4744-8f92-29ab2b786a33

Essentially CSV deserializer is not properly treating strings that are encased in quotation marks ("). I saw that was a problem with colors in version 1.7, in this 3.0 beta colors are fine, but now it's a problem with block_id field. Yes, but maybe it doesn't always make sense to have quotation marks within ID, well, it's an ID, but also GTFS docs say:

ID - An ID field value is an internal ID, not intended to be shown to riders, and is a sequence of any UTF-8 characters.
Using only printable ASCII characters is recommended.

So it technically can contain it. Also, Busman, scheduling system widely used in Poland seems to encase any string in quotation marks, which breaks this lib.

I'd suggest treating any string-like field as a string, and if it's enclosed in quotation mark handle it properly. Doesn't this lib have reference to any well known, well tested CSV deserialization library?