grote / osm2gtfs

Turn OpenStreetMap data and schedule information into GTFS
GNU General Public License v3.0
98 stars 31 forks source link

don't discard route without ref #150

Closed nlehuby closed 4 years ago

nlehuby commented 4 years ago

Our osm connector heavily rely on route/route_master ref tag.

Even if this tag is recommended, it can sometimes be missing in OSM without it being an error. For instance, in Abidjan (Ivory Coast), we have wôrô-wôrô and gbakas that are artisanal transports operated in minibuses or repainted cars, which do not have any number or reference.

The OSM ref tag is usually used to set the GTFS route_short_name, which is an optional field, so we should not discard all route that do not have a ref.

This PR allows to use route/route_master without ref tag to create the route.txt file. As it was used as a key to store the routes, I've updated the sorting of the routes in most creators so that the ids remain identical.

I will need your help to fix the last tests :wink:

nlehuby commented 4 years ago

I've investigated the Br_Florianopolis failing test: The data have one route without ref (osm id = 7878890, line 47020 of the overpass mock file). This route is now present in the cache file (wasn't before) but is discarded in the creator. I think this behaviour is ok. But I don't know what is the best fix: override the test data to remove this route vs changing the expected number of lines in the test. What do you think ?

pantierra commented 4 years ago

If the behaviour of dropping is desired, I would just lower the number of expected lines.

nlehuby commented 4 years ago

Tests to fix:

nlehuby commented 4 years ago

It seems that the remaining failing tests are due to a change in the order of creation of the trips, that results in a difference in the trip_id. But, there is an issue here: for now the trip are created in an impredictable order because we do not sort the dict (https://github.com/grote/osm2gtfs/blob/master/osm2gtfs/creators/trips_creator.py#L35) For instance, for Esteli, we should add trips to feed in this order to make the tests pass: ['1', '3', '2', '5', '4'].

I think the good fix here is to sort the dict to have a predictable order and update the tests accordingly. Would this be ok for you @xamanu or do you need to keep trip_id as they are at the present moment ?

pantierra commented 4 years ago

I think the good fix here is to sort the dict to have a predictable order and update the tests accordingly. Would this be ok for you @xamanu or do you need to keep trip_id as they are at the present moment ?

:+1: This would be an improvement for readability and therefor better control over the data.

nlehuby commented 4 years ago

Thanks @xamanu. This PR is now ready for review & merge.

pantierra commented 4 years ago

Looks, generally good. But I would like to have the conversation about dropping or not the checks for duplicated 'ref' in a separate issue/PR instead of mixing it into this one here. Can you please bring the checks back (eventually in the standard creators)?

This allows us to proceed here quickly and get this in, while having a more detailed exchange of arguments around the checks, which I think is really necessary.

nlehuby commented 4 years ago

ok, done.

The conversation about the duplicated ref is to be continued in https://github.com/grote/osm2gtfs/issues/152

nlehuby commented 4 years ago

Thanks @xamanu