hove-io / navitia

The open source software to build cool stuff with locomotion
https://www.navitia.io/
GNU Affero General Public License v3.0
430 stars 127 forks source link

GTFS Importing fails #1823

Closed vg-github closed 7 years ago

vg-github commented 7 years ago

I am trying to import the UK GTFS data from the Navitia OpenDataSoft:

https://navitia.opendatasoft.com/explore/dataset/uk/table/?sort=type_file

I get many errors, depending on the dataset I'm importing, and it seems each dataset has its own problems. Is there a way to import these datasets in a pre-validation stage first, I'm not sure what is Navitia expecting from each dataset, but it seems the might need a bit of enrichment/standardisation first.

For example: the UK gtfs.zip file, the error I'm getting from ED is:

WARN - Impossible to read /srv/ed/data/gtfs/feed_info.txt INFO - Unable to find production date in add_feed_info. INFO - Process terminated by signal: 15

For nfts.zip I get :

WARN - Impossible to read /srv/ed/data/ntfs/feed_info.txt INFO - Unable to find production date in add_feed_info. INFO - date de production: 2016-Sep-03 - 2017-Jan-03 WARN - Impossible to read /srv/ed/data/ntfs/shapes.txt FATAL - Impossible to read /srv/ed/data/ntfs/agency.txt FATAL - We received signal: 6, so it's time to die!! version: v2.18.0 ERROR - /usr/bin/gtfs2ed(navitia::get_backtrace()+0x28) [0x465ea8] /usr/bin/gtfs2ed(navitia::print_backtrace()+0x7d) [0x46cfdd] /usr/bin/gtfs2ed() [0x46497f] /lib/x86_64-linux-gnu/libc.so.6(+0x350e0) [0x7f7889f440e0] /lib/x86_64-linux-gnu/libc.so.6(gsignal+0x37) [0x7f7889f44067] /lib/x86_64-linux-gnu/libc.so.6(abort+0x148) [0x7f7889f45448] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(gnu_cxx::__verbose_terminate_handler()+0x15d) [0x7f788a831b3d] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0x5ebb6) [0x7f788a82fbb6] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0x5ec01) [0x7f788a82fc01] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0x5ee19) [0x7f788a82fe19] /usr/bin/gtfs2ed(ed::connectors::FileParsered::connectors::AgencyGtfsHandler::fill(ed::Data&)+0x3be) [0x4b738e] /usr/bin/gtfs2ed(bool ed::connectors::GenericGtfsParser::parseed::connectors::AgencyGtfsHandler(ed::Data&, std::string, bool)+0x128) [0x4b78f8] /usr/bin/gtfs2ed(ed::connectors::GtfsParser::parse_files(ed::Data&, std::string const&)+0xcb) [0x4a5fbb] /usr/bin/gtfs2ed(ed::connectors::GenericGtfsParser::fill(ed::Data&, std::string const&)+0x9) [0x497339] /usr/bin/gtfs2ed(main+0x692) [0x461cc2] /lib/x86_64-linux-gnu/libc.so.6(libc_start_main+0xf5) [0x7f7889f30b45] /usr/bin/gtfs2ed() [0x46457f]

For the last zip file, uk_ouk_uk_national.zip:

WARN - Impossible to find the Gtfs service 1237 referenced by trip 390515 WARN - Impossible to find the Gtfs service 1236 referenced by trip 390516 WARN - Impossible to find the Gtfs service 1236 referenced by trip 390517 INFO - reading stop times INFO - sorting stoptimes of vehicle_journeys INFO - Nb stop times: 3653764 WARN - Impossible to read /srv/ed/data/uk_ouk_uk_national/frequencies.txt INFO - We excluded 0 connections because they were too long INFO - We excluded 0 connections because they had no duration time INFO - build_shape_from_prev took 271 ms INFO - 0 vehicle journeys have been matched to at least one calendar WARN - no calendar found for 301970 vehicle journey INFO - 8 connections added INFO - 0 stop point connections deleted because of duplicate connections INFO - line: 6055 INFO - route: 6055 INFO - stoparea: 2584 INFO - stoppoint: 2633 INFO - vehiclejourney: 345567 INFO - stop: 3653764 INFO - connection: 2783 INFO - modes: 6 INFO - validity pattern : 1217 FATAL - We received signal: 6, so it's time to die!! version: v2.18.0 ERROR - /usr/bin/gtfs2ed(navitia::get_backtrace()+0x28) [0x465ea8]

Any feedback?

antoine-de commented 7 years ago

hi @vladimirghetau , it seems you're running gtfs2ed to read an ntfs, but it is done to read a gtfs. you should use fusio2ed to read the ntfs .

Hope this helps :smile:

vg-github commented 7 years ago

So, my new understanding is this:

The problem now is, I did try to import the NTFS using fusio2ed and I get this:

INFO - add_feed_info, Key :ntfs_version Value :0.3 not imported. INFO - add_feed_info, Key :fusio_url Value :http://vip-fusio-ihm.UK.prod.canaltp.fr/ not imported. INFO - add_feed_info, Key :fusio_version Value :1.10.85.204 not imported. INFO - date de production: 2016-Sep-03 - 2017-Jan-03 INFO - Reading geometries INFO - Nb shapes: 0 INFO - default agency tz Europe/London -> GMT WARN - Impossible to parse the co2_emission for CheckIn Boarding WARN - Impossible to parse the co2_emission for CheckOut Landing WARN - Impossible to read /srv/ed/data/uk_ntfs/line_groups.txt WARN - Impossible to read /srv/ed/data/uk_ntfs/line_group_links.txt WARN - Impossible to read /srv/ed/data/uk_ntfs/odt_conditions.txt INFO - reading stop times Killed

Also, when importing a GTFS file, I get something similar:

/usr/bin/gtfs2ed -i "/srv/ed/data/uk_gtfs" --connection-string="host=localhost user=navitia dbname=navitia password=navitia" WARN - Impossible to read /srv/ed/data/uk_gtfs/feed_info.txt INFO - Unable to find production date in add_feed_info. INFO - date de production: 2016-Sep-03 - 2017-Jan-03 WARN - Impossible to read /srv/ed/data/uk_gtfs/shapes.txt INFO - default agency tz Europe/London -> GMT INFO - reading stop times Killed

What could be the problem? I couldn't find any log either to explain why the execution was killed.

antoine-de commented 7 years ago

hum the killed is strange. Do you have enough memory ?

kinnou02 commented 7 years ago

Do you have enough RAM? It's looks like the OOM killer. In our production the kraken for uk dataset use 14go of ram, most of these are from the stop times.

You are right osm2ed is used for importing osm data from a pbf file, after having imported all your data you will need to run ed2nav to generate a data.nav.lz4 that will be loaded by kraken.

pbougue commented 7 years ago

For most of your usage, I guess you can get some help in https://github.com/CanalTP/navitia/blob/9bd3f700197234d752874a39b23fd7d876e9504b/documentation/vagrant/vagrant.md

pbougue commented 7 years ago

And loading the GTFS itself using gtfs2ed might be a little cheaper in RAM, but you might also loose some data quality (transfers for example)

vg-github commented 7 years ago

@pbougue - is there an alternative for gtfs2ed that loads data without loosing data quality?

pbougue commented 7 years ago

Hmm, I don't know precisely what data adjustment we do on top of uk GTFS. I'd say we at least add transfers and a stop_area creation from close stop_points... And I just checked that we have them in the GTFS provided on OpenDataSoft platform. So it should be pretty much ok, comparing to the NTFS.

But you will still encounter a big RAM need when loading street network and transportation datas in kraken for the real journey planning.