skinkie / reference

Personal repository where I collect working examples to understand inner workings while building PyNeTExConv
GNU Affero General Public License v3.0
1 stars 1 forks source link

SNCF still not running #58

Closed ue71603 closed 2 months ago

ue71603 commented 2 months ago

File:

[Uploading export-intercites-netex-last.zip…]()

Sequence:

del C:/Users/ue71603/MG_Daten/conversion/sncf_netex/*.duckdb
python swiss_to_db.py C:/Users/ue71603/MG_Daten/conversion/sncf_netex/export-intercites-netex-last.zip C:/Users/ue71603/MG_Daten/conversion/sncf_netex/sncf_netex-import.duckdb
python epip_db_to_db.py C:/Users/ue71603/MG_Daten/conversion/sncf_netex/sncf_netex-import.duckdb C:/Users/ue71603/MG_Daten/conversion/sncf_netex/netex-import-epip.duckdb
python epip_db_to_xml.py C:/Users/ue71603/MG_Daten/conversion/sncf_netex/sncf_netex-import.duckdb C:/Users/ue71603/MG_Daten/conversion/sncf_netex/netex-import-epip.duckdb C:/Users/ue71603/MG_Daten/conversion/sncf_netex/netex.xml
python netex_stats.py C:/Users/ue71603/MG_Daten/conversion/sncf_netex/netex.xml
del C:/Users/ue71603/MG_Daten/conversion/sncf_netex/*.duckdb

fails in swiss_to_db.py with:

CREATE TABLE IF NOT EXISTS StopPlace (id varchar(64) NOT NULL, version varchar(64) NOT NULL, object text NOT NULL, PRIMARY KEY (id, version));
CREATE TABLE IF NOT EXISTS TemplateServiceJourney (id varchar(64) NOT NULL, version varchar(64) NOT NULL, object text NOT NULL, PRIMARY KEY (id, version));
CREATE TABLE IF NOT EXISTS TopographicPlace (id varchar(64) NOT NULL, version varchar(64) NOT NULL, object text NOT NULL, PRIMARY KEY (id, version));
CREATE TABLE IF NOT EXISTS VehicleType (id varchar(64) NOT NULL, version varchar(64) NOT NULL, object text NOT NULL, PRIMARY KEY (id, version));
Traceback (most recent call last):
  File "C:\Users\ue71603\MG_Daten\github\reference\gtfs-netex-test\swiss_to_db.py", line 31, in <module>
    main(args.swiss_zip_file, args.database, args.clean_database)
  File "C:\Users\ue71603\MG_Daten\github\reference\gtfs-netex-test\swiss_to_db.py", line 15, in main
    with sqlite3.connect(database) as con:
  File "C:\Users\ue71603\MG_Daten\github\reference\gtfs-netex-test\swiss_to_db.py", line 21, in main
    insert_database(con, classes, file)
  File "C:\Users\ue71603\MG_Daten\github\reference\gtfs-netex-test\anyintodbnew.py", line 90, in insert_database
    for event, element in context:
  File "src\\lxml\\iterparse.pxi", line 208, in lxml.etree.iterparse.__next__
  File "src\\lxml\\iterparse.pxi", line 228, in lxml.etree.iterparse._read_more_events
  File "src\\lxml\\parser.pxi", line 1451, in lxml.etree._FeedParser.feed
  File "src\\lxml\\parser.pxi", line 624, in lxml.etree._ParserContext._handleParseResult
  File "src\\lxml\\parser.pxi", line 633, in lxml.etree._ParserContext._handleParseResultDoc
  File "src\\lxml\\parser.pxi", line 743, in lxml.etree._handleParseResult
  File "src\\lxml\\parser.pxi", line 672, in lxml.etree._raiseParseError
  File "file:/C:/Users/ue71603/MG_Daten/github/reference/gtfs-netex-test/trf2netex.log", line 1
lxml.etree.XMLSyntaxError: Document is empty, line 1, column 1
skinkie commented 2 months ago

Import works now, but I am curious what kind of profile this is. It fails becaus it expect an AvailabilityCondition, but it is getting a ValidBetween.

            <ServiceJourney id="FR:ServiceJourney::SN14140FERRE_1385103" responsibilitySetRef="1187" dataSourceRef="2148" changed="2024-04-30T14:01:58.775" version="any" status="active">
              <ValidBetween>
                <FromDate>2024-07-21T00:00:00</FromDate>
                <ToDate>2024-08-20T00:00:00</ToDate>
              </ValidBetween>
              <BrandingRef ref="IC"/>
              <Distance>0</Distance>
              <TransportMode>rail</TransportMode>
              <TransportSubmode>
                <RailSubmode>longDistance</RailSubmode>
              </TransportSubmode>
              <ServiceAlteration>planned</ServiceAlteration>
              <DepartureTime>06:59:00</DepartureTime>
              <dayTypes>
                <DayTypeRef ref="FR:DayType:2:" />
              </dayTypes>

In addition, if we would follow the DayTypeRef, we can observe the version attribute is missing as well, so it wouldn't be a valid file in the first place, but we can handle that.

            <DayTypeAssignment id="FR:DayTypeAssignment:2" version="any" order="1">
              <UicOperatingPeriodRef ref="FR:OperatingPeriod:2" />
              <DayTypeRef ref="FR:DayType:2:" />
            </DayTypeAssignment>

So there is data, but in this case we really need to guess right. This guess, for example, would be wrong for the Dutch profile. Sincere there is no type of frame, that is kind of ... unhappy?

            <UicOperatingPeriod id="FR:OperatingPeriod:2" version="any">
              <FromDate>2024-07-21T00:00:00</FromDate>
              <ToDate>2024-08-20T23:59:59</ToDate>
              <ValidDayBits>0111110011111001111100111010011</ValidDayBits>
            </UicOperatingPeriod>
ue71603 commented 2 months ago

export-intercites-netex-last.zip

ue71603 commented 2 months ago

We can give an info back to transport.gouv.fr that the data is not correct. If you have a list with examples....

skinkie commented 2 months ago

The question is not really is it correct or not but more like "why don't you mention which profile you would use"

ue71603 commented 2 months ago

it is EPIP according to https://eunapmonitoring.napcore.imet.gr/ :-)

skinkie commented 2 months ago

In that case, they suck at producing it ;-)

ue71603 commented 2 months ago

Still doesn't work.


(venv) PS C:\Users\ue71603\MG_Daten\github\reference\gtfs-netex-test> python swiss_to_db.py C:/Users/ue71603/MG_Daten/conversion/sncf_netex/export-intercites-netex-last.zip C:/Users/ue71603/MG_Daten/conversion/sncf_netex/sncf_netex-
import.duckdb
CREATE TABLE IF NOT EXISTS AvailabilityCondition (id varchar(64) NOT NULL, version varchar(64) NOT NULL, object text NOT NULL, PRIMARY KEY (id, version));
CREATE TABLE IF NOT EXISTS DestinationDisplay (id varchar(64) NOT NULL, version varchar(64) NOT NULL, object text NOT NULL, PRIMARY KEY (id, version));
CREATE TABLE IF NOT EXISTS Direction (id varchar(64) NOT NULL, version varchar(64) NOT NULL, object text NOT NULL, PRIMARY KEY (id, version));
CREATE TABLE IF NOT EXISTS Line (id varchar(64) NOT NULL, version varchar(64) NOT NULL, object text NOT NULL, PRIMARY KEY (id, version));
CREATE TABLE IF NOT EXISTS Operator (id varchar(64) NOT NULL, version varchar(64) NOT NULL, object text NOT NULL, PRIMARY KEY (id, version));
CREATE TABLE IF NOT EXISTS PassengerStopAssignment (id varchar(64) NOT NULL, version varchar(64) NOT NULL, ordr integer, object text NOT NULL, PRIMARY KEY (id, version, ordr));
CREATE TABLE IF NOT EXISTS ResponsibilitySet (id varchar(64) NOT NULL, version varchar(64) NOT NULL, object text NOT NULL, PRIMARY KEY (id, version));
CREATE TABLE IF NOT EXISTS ScheduledStopPoint (id varchar(64) NOT NULL, version varchar(64) NOT NULL, object text NOT NULL, PRIMARY KEY (id, version));
CREATE TABLE IF NOT EXISTS ServiceJourney (id varchar(64) NOT NULL, version varchar(64) NOT NULL, object text NOT NULL, PRIMARY KEY (id, version));
CREATE TABLE IF NOT EXISTS StopPlace (id varchar(64) NOT NULL, version varchar(64) NOT NULL, object text NOT NULL, PRIMARY KEY (id, version));
CREATE TABLE IF NOT EXISTS TemplateServiceJourney (id varchar(64) NOT NULL, version varchar(64) NOT NULL, object text NOT NULL, PRIMARY KEY (id, version));
CREATE TABLE IF NOT EXISTS TopographicPlace (id varchar(64) NOT NULL, version varchar(64) NOT NULL, object text NOT NULL, PRIMARY KEY (id, version));
CREATE TABLE IF NOT EXISTS VehicleType (id varchar(64) NOT NULL, version varchar(64) NOT NULL, object text NOT NULL, PRIMARY KEY (id, version));
Traceback (most recent call last):
  File "C:\Users\ue71603\MG_Daten\github\reference\gtfs-netex-test\swiss_to_db.py", line 31, in <module>
    main(args.swiss_zip_file, args.database, args.clean_database)
  File "C:\Users\ue71603\MG_Daten\github\reference\gtfs-netex-test\swiss_to_db.py", line 15, in main
    with sqlite3.connect(database) as con:
  File "C:\Users\ue71603\MG_Daten\github\reference\gtfs-netex-test\swiss_to_db.py", line 21, in main
    insert_database(con, classes, file)
  File "C:\Users\ue71603\MG_Daten\github\reference\gtfs-netex-test\anyintodbnew.py", line 90, in insert_database
    for event, element in context:
  File "src\\lxml\\iterparse.pxi", line 208, in lxml.etree.iterparse.__next__
  File "src\\lxml\\iterparse.pxi", line 228, in lxml.etree.iterparse._read_more_events
  File "src\\lxml\\parser.pxi", line 1451, in lxml.etree._FeedParser.feed
  File "src\\lxml\\parser.pxi", line 624, in lxml.etree._ParserContext._handleParseResult
  File "src\\lxml\\parser.pxi", line 633, in lxml.etree._ParserContext._handleParseResultDoc
  File "src\\lxml\\parser.pxi", line 743, in lxml.etree._handleParseResult
  File "src\\lxml\\parser.pxi", line 672, in lxml.etree._raiseParseError
  File "file:/C:/Users/ue71603/MG_Daten/github/reference/gtfs-netex-test/trf2netex.log", line 1
lxml.etree.XMLSyntaxError: Document is empty, line 1, column 1
ue71603 commented 2 months ago

I didn't find the log file in the configuration.

skinkie commented 2 months ago

Can you try it with netex_to_db.py instead? Swiss does not have crazy .log files.

ue71603 commented 2 months ago

There were a lot of those:

<JourneyPartCouple xmlns="http://www.netex.org.uk/netex" xmlns:gml="http://www.opengis.net/gml/3.2" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:siri="http://www.siri.org.uk/siri" id="FR:JourneyPartCouple:93a10d37-46d
a-4815-a13b-4787cdf4d9db" order="1" version="any"><StartTime>22:06:00</StartTime><EndTime>23:12:00</EndTime><FromStopPointRef ref="FR:ScheduledStopPoint:87547000" version="any"/><ToStopPointRef ref="FR:ScheduledStopPoint:87543017" v
ersion="any"/><MainPartRef ref="FR:JourneyPart:70bc5baa-02ca-47ae-9c18-3af3c05e8510" version="any"/><journeyParts><JourneyPartRef ref="FR:JourneyPart:70bc5baa-02ca-47ae-9c18-3af3c05e8510" version="any"/><JourneyPartRef ref="FR:Journ
eyPart:f3ddd9b2-7519-482e-aa8a-31feb624610f" version="any"/></journeyParts><TrainNumberRef ref="FR:TrainNumber::SN3751FERRE_1600589" version="any"/></JourneyPartCouple>

No error but it seems that netex_to_db.py terminated normally just after those outputs.

python epip_db_to_db.py C:/Users/ue71603/MG_Daten/conversion/sncf_netex/sncf_netex-import.duckdb C:/Users/ue71603/MG_Daten/conversion/sncf_netex/netex-import-epip.duckdb got unhappy rather fast:

(venv) PS C:\Users\ue71603\MG_Daten\github\reference\gtfs-netex-test> python epip_db_to_db.py C:/Users/ue71603/MG_Daten/conversion/sncf_netex/sncf_netex-import.duckdb C:/Users/ue71603/MG_Daten/conversion/sncf_netex/netex-import-epip .duckdb CREATE TABLE IF NOT EXISTS AvailabilityCondition (id varchar(64) NOT NULL, version varchar(64) NOT NULL, object text NOT NULL, PRIMARY KEY (id, version)); CREATE TABLE IF NOT EXISTS DestinationDisplay (id varchar(64) NOT NULL, version varchar(64) NOT NULL, object text NOT NULL, PRIMARY KEY (id, version)); CREATE TABLE IF NOT EXISTS Direction (id varchar(64) NOT NULL, version varchar(64) NOT NULL, object text NOT NULL, PRIMARY KEY (id, version)); CREATE TABLE IF NOT EXISTS Line (id varchar(64) NOT NULL, version varchar(64) NOT NULL, object text NOT NULL, PRIMARY KEY (id, version)); CREATE TABLE IF NOT EXISTS Notice (id varchar(64) NOT NULL, version varchar(64) NOT NULL, object text NOT NULL, PRIMARY KEY (id, version)); CREATE TABLE IF NOT EXISTS NoticeAssignment (id varchar(64) NOT NULL, version varchar(64) NOT NULL, ordr integer, object text NOT NULL, PRIMARY KEY (id, version, ordr)); CREATE TABLE IF NOT EXISTS Operator (id varchar(64) NOT NULL, version varchar(64) NOT NULL, object text NOT NULL, PRIMARY KEY (id, version)); CREATE TABLE IF NOT EXISTS PassengerStopAssignment (id varchar(64) NOT NULL, version varchar(64) NOT NULL, ordr integer, object text NOT NULL, PRIMARY KEY (id, version, ordr)); CREATE TABLE IF NOT EXISTS RouteLink (id varchar(64) NOT NULL, version varchar(64) NOT NULL, object text NOT NULL, PRIMARY KEY (id, version)); CREATE TABLE IF NOT EXISTS RoutePoint (id varchar(64) NOT NULL, version varchar(64) NOT NULL, object text NOT NULL, PRIMARY KEY (id, version)); CREATE TABLE IF NOT EXISTS ScheduledStopPoint (id varchar(64) NOT NULL, version varchar(64) NOT NULL, object text NOT NULL, PRIMARY KEY (id, version)); CREATE TABLE IF NOT EXISTS ServiceJourney (id varchar(64) NOT NULL, version varchar(64) NOT NULL, object text NOT NULL, PRIMARY KEY (id, version)); CREATE TABLE IF NOT EXISTS ServiceJourneyPattern (id varchar(64) NOT NULL, version varchar(64) NOT NULL, object text NOT NULL, PRIMARY KEY (id, version)); CREATE TABLE IF NOT EXISTS StopPlace (id varchar(64) NOT NULL, version varchar(64) NOT NULL, object text NOT NULL, PRIMARY KEY (id, version)); CREATE TABLE IF NOT EXISTS VehicleType (id varchar(64) NOT NULL, version varchar(64) NOT NULL, object text NOT NULL, PRIMARY KEY (id, version)); epip_line_memory Line 1108 ScheduledStopPoint 0

epip_scheduled_stop_point_memory
epip_site_frame_memory
PassengerStopAssignment 138
 PassengerStopAssignment 138StopPlace 138
 StopPlace 138epip_service_journey_generator
Traceback (most recent call last):
  File "C:\Users\ue71603\MG_Daten\github\reference\gtfs-netex-test\epip_db_to_db.py", line 53, in <module>
    main(args.source, args.target)
    epip_service_journey_generator(source_database_file, target_database_file, generator_defaults, None)
  File "C:\Users\ue71603\MG_Daten\github\reference\gtfs-netex-test\transformers\epip.py", line 415, in epip_service_journey_generator
    with sqlite3.connect(write_database) as write_con:
  File "C:\Users\ue71603\MG_Daten\github\reference\gtfs-netex-test\transformers\epip.py", line 421, in epip_service_journey_generator
    write_generator(write_con, ServiceJourney, query(read_con), True)
  File "C:\Users\ue71603\MG_Daten\github\reference\gtfs-netex-test\netexio\dbaccess.py", line 276, in write_generator
    for a in _prepare3(generator, objectname):
  File "C:\Users\ue71603\MG_Daten\github\reference\gtfs-netex-test\netexio\dbaccess.py", line 252, in _prepare3
    for obj in generator3:
  File "C:\Users\ue71603\MG_Daten\github\reference\gtfs-netex-test\transformers\epip.py", line 408, in query
    yield process(sj, read_database, write_database, generator_defaults)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\ue71603\MG_Daten\github\reference\gtfs-netex-test\transformers\epip.py", line 392, in process
    service_journey_ac_to_day_type(sj, availability_conditions, day_types, uic_operating_periods, day_type_assignments)
  File "C:\Users\ue71603\MG_Daten\github\reference\gtfs-netex-test\transformers\epip.py", line 277, in service_journey_ac_to_day_type
    for a in ac.choice:
             ^^^^^^^^^
AttributeError: 'ValidBetween' object has no attribute 'choice'
skinkie commented 2 months ago

22:06:0023:12:00<ToStopPointRef ref="FR:ScheduledStopPoint:87543017" v ersion="any"/>

Need to check if his is due to the element not being available, or something else.

for a in ac.choice:
         ^^^^^^^^^

AttributeError: 'ValidBetween' object has no attribute 'choice'

Yes, this is what I mentioned. We can check for AvailabilityConditions vs ValidBetween but then the big question is: does their ValidBetween limit the ServiceJourney beyond the DayType (UicOperationPeriod). Because if that occurs that is something that Christoph should have prevented in his country.

ue71603 commented 2 months ago

sorry, didn't notice that. What SNCF did (for some time). Was using ValidBetween to restrict the time and then the UIC to file it.

E.g. the train runs on January 1 and 4 => Validbetwent 2024-01-01 and 2024-01-04 and the UIC would be "1001"

But I don't know if they really did it that way.

skinkie commented 2 months ago

sorry, didn't notice that. What SNCF did (for some time). Was using ValidBetween to restrict the time and then the UIC to file it.

E.g. the train runs on January 1 and 4 => Validbetwent 2024-01-01 and 2024-01-04 and the UIC would be "1001"

But I don't know if they really did it that way.

I am currently more worried about:

2024-01-01, 2024-01-04, 10011 and then saying ValidBeteen 2024-01-01, 2024-01-04, so restricting the ServiceJourney beyond the Calendar. In The Netherlands we have dubbed that double validity and everyone was very against that.

I am not against ValidBetween as some sort of 'attribute' stating this trip only operates between then and then. But it should not affect the calendar. The Dutch approach is the opposite, we have a AvailabilityCondition on a ServiceJourney and don't use a calendar.

skinkie commented 2 months ago

@ue71603 just committed an update that will basically ignore the ValidBetween, and if it does not find AvailabilityConditions, but it will find a DayType it will just continue as is.

ue71603 commented 2 months ago

I am not so sure that this is right.

Let's assume the train runs January 8 and January 11

Then Validbetween 2024-01-08 and 2024-01-11 an Uic: 1001 (again)

ue71603 commented 2 months ago

and not 00000001001

ue71603 commented 2 months ago

at least they did it for some time. I would need to check their data. Usually it is simple to note: Their UCBitfields have not the same length.

ue71603 commented 2 months ago

StopPlace

ServiceJourney

DayTypes defined, but only : <DayType id="FR:DayType:109:" version="any"/>

UICOperatingPeriod defined and used with DaytypeAssignment::
            <UicOperatingPeriod id="FR:OperatingPeriod:1" version="any">
              <FromDate>2024-07-21T00:00:00</FromDate>
              <ToDate>2024-08-20T23:59:59</ToDate>
              <ValidDayBits>1111111111111111111111111111111</ValidDayBits>
            </UicOperatingPeriod>

data for one month all the same length

Check for a given service journey:


            <ServiceJourney id="FR:ServiceJourney::SN3789FERRE_1555233" responsibilitySetRef="1187" dataSourceRef="2148" changed="2023-11-30T10:04:23.109" version="any" status="active">
              <ValidBetween>
                <FromDate>2024-07-21T00:00:00</FromDate>
                <ToDate>2024-08-20T00:00:00</ToDate>
              </ValidBetween>
              ---
              <dayTypes>
                <DayTypeRef ref="FR:DayType:52:" />
              </dayTypes>
            <DayType id="FR:DayType:52:" version="any"/>
            <DayTypeAssignment id="FR:DayTypeAssignment:52" version="any" order="1">
              <UicOperatingPeriodRef ref="FR:OperatingPeriod:52" />
              <DayTypeRef ref="FR:DayType:52:" />
            </DayTypeAssignment>

So I agree with you that you can remove ValidBetween in this case.

skinkie commented 2 months ago

There were a lot of those:

<JourneyPartCouple xmlns="http://www.netex.org.uk/netex" xmlns:gml="http://www.opengis.net/gml/3.2" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:siri="http://www.siri.org.uk/siri" id="FR:JourneyPartCouple:93a10d37-46d
a-4815-a13b-4787cdf4d9db" order="1" version="any"><StartTime>22:06:00</StartTime><EndTime>23:12:00</EndTime><FromStopPointRef ref="FR:ScheduledStopPoint:87547000" version="any"/><ToStopPointRef ref="FR:ScheduledStopPoint:87543017" v
ersion="any"/><MainPartRef ref="FR:JourneyPart:70bc5baa-02ca-47ae-9c18-3af3c05e8510" version="any"/><journeyParts><JourneyPartRef ref="FR:JourneyPart:70bc5baa-02ca-47ae-9c18-3af3c05e8510" version="any"/><JourneyPartRef ref="FR:Journ
eyPart:f3ddd9b2-7519-482e-aa8a-31feb624610f" version="any"/></journeyParts><TrainNumberRef ref="FR:TrainNumber::SN3751FERRE_1600589" version="any"/></JourneyPartCouple>

This is an interesting one. The JourneyPartCouple does not have order. You should report a bug upstream :-) Now why does it go wrong with our code? Because our code evaluates the class with the schema creation, but evaluates the object at insert. I think we can agree that this is a bug too.

And to be fair. This only goes wrong with our (high speed) loader. Because that assumes that the input is correct.

skinkie commented 2 months ago

In addition there is another bug. So JourneyPart has an optional order in NeTEx.

https://github.com/NeTEx-CEN/NeTEx/issues/761

skinkie commented 2 months ago

This is obviously also wrong (the TimetabledPassingTime).

              <passingTimes>
                <TimetabledPassingTime>
                  <PointInJourneyPatternRef ref="FR:StopPointInServiceJourneyPattern::87673004_SN14140FERRE_1385103" />
                  <DepartureTime>06:59:00</DepartureTime>
                </TimetabledPassingTime>
                <TimetabledPassingTime>
                  <PointInJourneyPatternRef ref="FR:StopPointInServiceJourneyPattern::87672253_SN14140FERRE_1385103" />
                  <ArrivalTime>07:44:00</ArrivalTime>
                  <DepartureTime>07:46:00</DepartureTime>
                </TimetabledPassingTime>
                <TimetabledPassingTime>

We can obviously write a transformation this fixes this as well. But what I think we should first establish is how wrong is the source. I understand it is from France... so wrong is not the right dimension.

ue71603 commented 2 months ago

@skinkie can I retry with your fix or do we need to have a different file that works?

skinkie commented 2 months ago

My export now results in a valid file, if you ignore the the dataSourceRef stuff.

ue71603 commented 2 months ago

currently no errors found