grote / osm2gtfs

Turn OpenStreetMap data and schedule information into GTFS
GNU General Public License v3.0
99 stars 31 forks source link

Simplify region configurations #107

Open AltNico opened 6 years ago

AltNico commented 6 years ago

The configuration of regions are already quite simple, but I think we can even simplify them more. My intention behind that is that we make it really easy for new regions to use osm2gtfs without having to configure a lot of things which could be simply use sane defaults.

The results of these process should lead to a revision of the currently existing wiki article about the configuration saying whether a configuration field is mandatory or not and if not, what kind of default is used.

For example, this is the current configuration of fenix:

{
    "query": {
        "bbox": {
            "n": "-27.2155",
            "s": "-27.9410",
            "e": "-48.2711",
            "w": "-49.0155"
        },
        "tags": {
            "route": "bus"
        }
    },
    "stops": {
        "name_without": "Ponto sem nome",
        "name_auto": "yes"
    },
    "agency": {
        "agency_id": "BR-Floripa",
        "agency_name": "Consórcio Fênix",
        "agency_url": "http://www.consorciofenix.com.br/",
        "agency_timezone": "America/Sao_Paulo",
        "agency_lang": "pt",
        "agency_phone": "+55 (48) 3025-6868",
        "agency_fare_url": ""
    },
    "feed_info": {
        "publisher_name": "Torsten Grote",
        "publisher_url": "https://transportr.grobox.de",
        "version":  "0.1"
    },
    "schedule_source": "http://www.consorciofenix.com.br/api2/linhas.json",
    "output_file": "data/br-floripa.zip",
    "selector": "fenix"
}

Here are some questions:

For example, it would be really cool if we could use configurations like this:

{
    "query": {
        "tags": {
            "network": "NI-Estelí"
        }
    },
    "schedule_source": "https://github.com/mapanica/pt-data-esteli/blob/master/timetable.json"
}

Sure, the ability to configure everything is cool but we should not overwhelm users with it. In my opinion.

pantierra commented 6 years ago

Hey, cool idea, I like it a lot. Will try to answer some of your questions:

Do we need to specify a bbox if a given network name only exists once in the world? How would be the performance?

If there were a clear way in OSM to unite networks and to avoid name clashes - so, in an ideal world - no problem. In reality, this doesn't really can be expected to be formally correct. To go for a bbox and free tags to be applied, was the most flexible approach we thought we were able to take there. And generally, to cover more cases we probably want to provide this combination. As for this issue, I would like to go for making the bbox optional. This should be possible, and then also performance issues are problem of the one who uses the config file to create their query.

If a bbox is given, do we need to specify the route type or are the defaults enough?

Also here, we can not expect a completely consistent schema on the OSM side. I think the combination of (optionally) using bbox and combining tags is really the best way to allow this script to be used flexibly. But yes, we should make this optional and stick to simple defaults. public_tansport:version=2 has been the one in the past to generally select based on a bbox.

Do every region needs to define their own stops_without_name name or could we use a (internationalized) default name?

We could use a default name and surely make this optional!

What is the default of stops->name_auto?

It is a nice logic, already coming from the first city this script was made for. It basically queries OSM for relevant places close to a stop without a name set and then assigns - if found - the name to the stop.

The default behaviour: not executing it, until opt-in:

if self.auto_stop_names:
    self._get_names_for_unnamed_stops()

Do we need to define an agency or is it enough for testing purposes to just use some osm2gtfs default agency information?

The GTFS specifications should guide us here. And there it seems we have to provide some required data. This probably we can not make optional and needs to be introduced by a human.

Do we need to give information about the publisher or is osm2gtfs and the link to the repo enough, at least for testing purposes?

In the GTFS specs there are also required values for publisher name and url, etc. If it is required in GTFS I think we should not fill it in with dummy content.

Do we need to specify a version or could this be automatically generated?

No idea. Very good question.

Do we need to specify the path of the output file or could some automatically generated default be used?

I think we should provide a default of data/<SELECTOR>.zip and make this field optional.

Do we need to specify a selector or could it be generated from the file name of the configuration?

This is a tricky question. As a default we support a file living in the osm2gtfs root with the name config.json, with not specifying a selector, this could only use standard creators. If we want to say, all creators should live in the creators directory and follow the naming convention #83, then we could derive from this the selector. But then we should also get rid of using the config.json, which would be also a pity, because it is a very immediate entry point to use the script.

pantierra commented 6 years ago

In the wiki, I added a general overview of the GTFS values and where they are coming from, and how they may be overridden. Maybe this list is also useful for the thoughts in this issue to optimize it a bit.

nlehuby commented 6 years ago

GTFS agency matches pretty well with what OSM calls operator. We may consider using the operator tag on route_master as default instead of providing agency in the config file. (Whereas the network tag could be a realistic fallback too) The main difficulty with be with the agency url (which is a required field), as there are none in OSM.

prhod commented 6 years ago

I also think the default behavior should be to use OSM data. But I prefer the network over the operator :p For the URL, we could use the osm2gtfs github URL (as the source of the feed, even if it's not what it's expected). And for the TimeZone (also required), there may be a way to find the local one ?

ialokim commented 6 years ago

And for the TimeZone (also required), there may be a way to find the local one ?

Or even better, the timezone inside the bbox?

nlehuby commented 6 years ago

Here is an open API to find the right timezone : https://timezones-api.now.sh/timezones-4fbc08f/by_point.json?longitude=-0.1406632&latitude=50.8246776 The source data for timezones is derived from OSM.

Skippern commented 6 years ago

Time zone is relatively easy for local routes, but long distance routes can be in multiple time zones (I noticed my town Guarapari/ES is services by a bus route from Acre close to the Bolivian border to Ilhéus/BA, to my knowledge, it crosses 3 different time zones). Is there any good ways to handle such cases?

Sent from my iPhone

On 28 Dec 2017, at 10:17, Noémie notifications@github.com wrote:

Here is an open API to find the right timezone : https://timezones-api.now.sh/timezones-4fbc08f/by_point.json?longitude=-0.1406632&latitude=50.8246776 The source data for timezones is derived from OSM.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

prhod commented 6 years ago

I looked closely the GTFS specifications on this. The reference Time zone is the Agency location. There could be a difference in the time zone for stops, but stop_times are specified with the Agency time zone. (be carefull, if the feed contains several agencies, the all should be with the same time zone). When looking at @nlehuby api, there is a Spatialite database in the source with the shapes of time zones. I think of those methodes : Use the API :