rust-transit / gtfs-structure

Read a GTFS file
MIT License
61 stars 32 forks source link

Error while reading this gtfs #81

Closed fchabouis closed 3 years ago

fchabouis commented 3 years ago

Hello again :) I have a problem with the attached GTFS, with the stop_times file. I have reduced the file to a single line.

It looks like the problem comes from the fact that the last column is optional, and not given any value.

The error message I get is :

CSVError { file_name: "stop_times.txt", source: Error(Deserialize { pos: Some(Position { byte: 128, line: 1, record: 1 }), err: DeserializeError { field: None, kind: Message("Invalid value ``, expected 0 or 1") } }), line_in_error: Some(LineError { headers: ["trip_id", "arrival_time", "departure_time", "stop_id", "stop_sequence", "stop_headsign", "pickup_type", "drop_off_type", "shape_dist_traveled", "timepoint"], values: ["10000", "08:20:00", "08:20:00", "SETPDAU1", "1", "", "", "", "0", ""] }) }

The stop_times looks like this :

trip_id,arrival_time,departure_time,stop_id,stop_sequence,stop_headsign,pickup_type,drop_off_type,shape_dist_traveled,timepoint
"10000","08:20:00","08:20:00","SETPDAU1","1","","","","0",""

If I replace the last "" by "0", I'm fine.

"10000","08:20:00","08:20:00","SETPDAU1","1","","","","0","0"

I tried to understand how the code knows that timepoint is expecting a 0 or 1, and to see if it was considered as an optional field, but did not succeed unfortunately.

My little test code :

use std::convert::TryFrom;
fn main() {
    let path = "/home/francis/Downloads/GTFS_test.zip";
    let res_raw_gtfs = gtfs_structures::RawGtfs::new(path);

    match res_raw_gtfs {
        Ok(raw_gtfs) => call_try_from(raw_gtfs),
        Err(e) => println!("error"),
    }
}

fn call_try_from(raw_gtfs: gtfs_structures::RawGtfs) {
    match gtfs_structures::Gtfs::try_from(raw_gtfs) {
        Ok(ref gtfs) => {
            println!("ok");
        }
        Err(e) => {
            println!("{:?}", e);
        }
    }
}

Can you help me with this issue ? thanks !

GTFS_test.zip

Tristramg commented 3 years ago

as we can see here, we use the deserialize_bool function: https://github.com/rust-transit/gtfs-structure/blob/master/src/objects.rs#L348-L353 If the column is absent, then it is true.

However, in the example, the column is present, but without a value. We chose to consider that only "0" and "1" are valid values. https://github.com/rust-transit/gtfs-structure/blob/master/src/objects.rs#L868-L881

Yet, the specification (https://gtfs.org/reference/static#stop_timestxt) states:

1 or empty - Times are considered exact.

This means that we are not respecting the specification, and it is an actual bug.

To fix it, I’d be cautious and create a new deserializer for this specific situation: we do not want to accidentaly interpret an empty value as a real value if the spec says nothing or may even say the oposite. I’d be more logical to consider that empty means false, not true!

Do you agree @antoine-de?

fchabouis commented 3 years ago

Should it be an option of bool instead of a bool ?

Tristramg commented 3 years ago

I don’t think so. The specification seems very clear to me: «Times are considered exact.» if nothing then it must be set as such

On Tue, 22 Jun 2021 at 14:38, Francis Chabouis @.***> wrote:

Should it be an option of bool instead of a bool ?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/rust-transit/gtfs-structure/issues/81#issuecomment-865947056, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAC7S5BVFXPNWEELKIZAKLTUB75HANCNFSM47DMKCYQ .

antoine-de commented 3 years ago

yep, seems fine for me!

Tristramg commented 3 years ago

even better, shouldn’t it be an enum? one could imagine that this will evolve to other values (like arrival time can be early), and it will make it more explicite for users. Timepoint=bool is not very nice (but it’s a breaking change in the API)

On Tue, 22 Jun 2021 at 15:00, Antoine D @.***> wrote:

yep, seems fine for me!

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/rust-transit/gtfs-structure/issues/81#issuecomment-865961835, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAC7S6UHEXKMABMQ32FHUTTUCCQTANCNFSM47DMKCYQ .

fchabouis commented 3 years ago

ah ok I understood how it works, thanks.

Tristramg commented 3 years ago

fixed with #82