hove-io / transit_model

Managing transit data with Rust
GNU Affero General Public License v3.0
55 stars 28 forks source link

[gtfs2netexfr] Help needed to pinpoint cause of "unused lines" pruning #824

Closed thbar closed 2 years ago

thbar commented 2 years ago

Hello!

I am a member of transport.data.gouv.fr, reaching out because we are investigating on the result of a GTFS to Netex conversion (https://github.com/etalab/transport-site/issues/1864) for which I'd need a bit of help if possible, and there is a maintenance contract if I understand well (although I do not know the exact terms).

When running the converter against the 2 GTFS resources available here, the resulting Netex contains a single "line", whereas the producer includes many in their GTFS.

As documented at the bottom of https://github.com/etalab/transport-site/issues/1864, I have dived into the Rust source (with logs/breakpoint debugging etc) and I have seen that the converter optimises the output to remove lines which are not referenced via routes, themselves removed if no service/calendar refers to them.

I have not yet been able to fully trace if the GTFS is faulty, or if the converter has excessive pruning for some reason (the former is more likely than the latter I believe).

Before I try to mount the data into a database and cross-check, I wondered if there are tricks in the converter to help pinpoint the issue, if there are known caveats that could explain an excessive pruning maybe, or if you could recommend tools to make the analysis faster/easier?

Thanks in advance!

woshilapin commented 2 years ago

Hi @thbar,

I don't see any other tools than checking the logs to try and understand what is going on. Although, even the logs are not exhaustive, so you might end up in a dead end trying to understand what is going on from those logs.

There is however, one thing that might be worth mentioning. You are mentioning lines in GTFS... but GTFS doesn't have the notion of line: it's a route. Even if at first glance, it might seem like a different word for the same concept, it is not. In NeTEx, there is the notion of both Line and Route.

In transit_model, we're using an Intermediate Representation (IR) called NTFS which also has the notions of line and route [1]. And therefore, we're first transforming GTFS into NTFS, then NTFS into NeTEx. This is an implementation detail but with this information, you can now take a look to the specification to convert GTFS into NTFS that will explain the rules behind the conversion from a GTFS route into both NTFS line and route (and therefore, into Line and Route in NeTEx).

Please come back to us if you need more details on some part of these documentations, or even on some implementation details.

[1]: The notion of line and route in NTFS and in NeTEx might be slightly different. If you want more information about that, look at the specification for converting from NTFS to NeTEx.

woshilapin commented 2 years ago

Hi @thbar, have you had the chance to take a look at it? I'm wondering if we should keep the issue open or not?

thbar commented 2 years ago

@woshilapin thanks for the feedback and sorry for the long delay in responding. I will close for now, because there is no clear proof of anything. We must do more debugging on the GTFS file itself, then see if I need to re-open or not! Thanks for your feedback in all cases, I'll go ahead and close.