rust-transit / gtfs-structure

Read a GTFS file
MIT License
56 stars 32 forks source link

Use Id<Stop> instead of Arc<Stop>: Id as string #126

Open antoine-de opened 2 years ago

antoine-de commented 2 years ago

Same as https://github.com/rust-transit/gtfs-structure/pull/123 but without generational_indexes.

I think the indexes are really great for performance since:

I don't think we need those performance in this crate though. Having only typed index (a think wrapper over the real gtfs identifier) would be more convenient I think. It would:

both approached would need benchmark/real use case to see if they are worth it.

Note: I quite like this approach.

If we want to go there, there is still work to:

Note: One canvas of the approach is that this would enforce ALL ids to be valid. I think this could be quite a good thing, but this will break the reading of many datasets out there.

antoine-de commented 2 years ago

I found that I could decrease drastically the number of String creation by implementing Borrow<str>, now I find the ergonomics not that bad

antoine-de commented 2 years ago

I think the API is quite nice now. I updated an example to show how we can use the stop_id now:


    // you can access a stop by a &str
    let _ = gtfs.get_stop_by_raw_id("stop1")?;

    let trip = gtfs.trips.get("trip1")?;
    let stop_id: &gtfs_structures::Id<gtfs_structures::Stop> =
        &trip.stop_times.first()?.stop;

    // or with a typed id if you have one

    // if no stops have been removed from the gtfs, you can safely access the stops by it's id
    let s = &gtfs.stops[stop_id];
    println!("stop name: {}", &s.name);

    // if some removal have been done, you can also you those method to get an Option<Stop>
    let s = gtfs.get_stop(stop_id)?;
    println!("stop description: {}", &s.description);

    // or you can access it via `stops.get`
    let s = gtfs.stops.get(stop_id)?;
    println!("stop location type: {:?}", &s.location_type);

    let mut s = gtfs.stops.get_mut(stop_id)?;
    s.code = Some("code".into());
antoine-de commented 1 year ago

Note: One canvas of the approach is that this would enforce ALL ids to be valid. I think this could be quite a good thing, but this will break the reading of many datasets out there.

RawGTFS reading would still be working, but not GTFS anymore.

Do anybody have some thoughts about this?

fchabouis commented 1 year ago

Does it mean that if an ID is invalid, the gtfs validator would output a fatal error ? I would not like it, because all other errors and warnings would be hidden by this fatal error. I prefer an error on Ids to be a simple error.

antoine-de commented 1 year ago

hum, there it's a library to read a GTFS. the change of id would however lead to GTFS not loading (since the ID<Stop> cannot be invalid). It would still be possible to read RawGTFS though, since they would still have raw String Ids.

For transport.data.gouv.fr's validator usecase, you'd keep all the warnings/error on the rawGTFS, but loose the error needing the GTFS (and that's a lot 😱 https://github.com/etalab/transport-validator/blob/9a04e6/src/validate.rs#L65-L78)

I agree it's a pity since it would make a lot of dataset unreadable, but at the same time foreign keys are quite important, and I don't feel good about having Option<Id<T>> (or bake an invalid state into the ID) everywhere to express the fact that they can be invalid.

Any ideas / thoughts on this?

fchabouis commented 1 year ago

In our case, the business requirements are to make the validator "tolerant" (see this comment ) so that we get as many validation information as possible, and not just a single fatal error...