skinkie / reference

Personal repository where I collect working examples to understand inner workings while building PyNeTExConv
GNU Affero General Public License v3.0
0 stars 1 forks source link

GTFS group trips by routes #69

Open skinkie opened 1 month ago

ue71603 commented 1 month ago

@skinkie a bit more description?

skinkie commented 1 month ago

When processing data we know from GTFS that (Agency, Route) > Trips would guarantee partitions of data that can be processed in parallel. This also means that there is no overlap between the ServiceJourneyPatterns it could or would generate. Obviously the pattern may be the same, but since the RouteRef in NeTEx must point to a dIffert route for a different Line (if you wouldn't use Mentz software, that would require LineRef on ServiceJourney level) it has be unique.

Hence given the above the oppertunity is to keep a hash list of created ServiceJourneyPatterns in memory per (Agency, Route), as being a shortcut of the source standard.

ue71603 commented 3 weeks ago

I am not so sure. We have routes that are served by multiple agencies.

ue71603 commented 3 weeks ago

but they are rare, so it might work.

skinkie commented 3 days ago

I am not so sure. We have routes that are served by multiple agencies.

From GTFS standpoint this is not possible. A route in GTFS has a single agency_id. So in GTFS it is either multiple (the same) routes, or introducing the combined agency.

ue71603 commented 3 days ago

I would not like to introduce new agencies. I think if we in such cases have two Pattern instead of one, then we still have reduced the number of Pattern a lot and the rest is ok as duplicate.

skinkie commented 3 days ago

From GTFS -> NeTEx we can do what we want with ServiceJourneyPatterns, since the concept does not exist in GTFS. My aim would be that we have a shortcut from GTFS to NeTEx to (be able to) directly infer TimeDemandTypes and ServiceJourneyPatterns since doing it directly from GTFS would significantly reduce the operational cost. That having said it is obviously an extra importing step, so we must give the user the option to:

  1. import as-is (Calls)
  2. directly make TimetabledTime + ServiceJourneyPattern (ease the later EPIP processing)
  3. directly add ServiceJourneyPattern + TimeDemandTypes (to significantly reduce storage requirements)