grote / osm2gtfs

Turn OpenStreetMap data and schedule information into GTFS
GNU General Public License v3.0
99 stars 31 forks source link

Extend handling of `start_date`/`end_date` to support schedule as a source #111

Open AltNico opened 6 years ago

AltNico commented 6 years ago

As of #99 only the start_date/end_date of the configuration are used and never the ones from the schedule. As the ones from the schedules are likely more up-to-date than in the schedule, these should be used if available. If they are not available, the ones from the configuration should be used like done at the moment and only if neither the configuration nor the schedule contain these dates, osm2gtfs should generate dates for them.

pantierra commented 6 years ago

I can see also (even more) scenarios where the start and end date from the configuration file should be used. I guess both are valid approaches. Can we invent a way of enabling a flexible way to define which is more important over the other? See more on this discussion here.

Generally I think, that the configuration file can be controlled by the executor of osm2gtfs. However the schedule source file might be taken from a place where the executor can not control it (like, for example Fenix). As the executor of the program I want to have at least the last word on this question. So, for me seems most logical to give priority over the config file, because if the time should be taken directly from the source, the information in the config file can be just omitted.


And why is this marked as a bug? And not an enhancement? It works as it worked before.

ialokim commented 6 years ago

As the executor of the program I want to have at least the last word on this question. So, for me seems most logical to give priority over the config file, because if the time should be taken directly from the source, the information in the config file can be just omitted.

Sounds very reasonable, I agree with you!

ialokim commented 6 years ago

I was thinking about another approach, which could be selecting the minimal intersection between the given start and end dates. @AltNico already said this would be too complicated and confusing in his opinion, but I wanted to mention the idea here, too.

But I'm convinced we should at least output some warning message if there are two different start or end dates, instead of just ignoring one. See #95 and #98.

pantierra commented 6 years ago

Not sure if I understand well. But in case I did, just wanted to mention that we are doing it in one case: if no dates or only the start_date is given, the time span is one year per default.

ialokim commented 6 years ago

Not sure if I understand well.

Okay, I'll try to explain with some examples. I've made two suggestions:

  1. instead of dropping completely the start_date and end_date information from the schedule.json (if it is present in config.json), we could use the _latter start_date_ from both sources and the _earlier end_date_ from both sources (some sort of minimal intersection), to make sure validity of both sources are respected (if we already know the schedule information will become invalid after a certain end_date, we don't want the GTFS to be valid after this end_date)
  2. while noticing there are two different start_dates or end_dates, we should output a warning message in #95 and/or #98, so that the user knows the ones from the schedule.json are ignored