opentripplanner / OpenTripPlanner

An open source multi-modal trip planner
http://www.opentripplanner.org
Other
2.12k stars 1.01k forks source link

How To Handle Time Zones? (General Discussion) #2602

Open robludwig opened 5 years ago

robludwig commented 5 years ago

This is a collection of comments related to handling timezones within OTP. I submitted PR #2595 which spawned some discussion.

@t2gran noted that this approach was perhaps not ideal

TimeZone decoration is a candidate for a pluggable module - there is a lot of different strategies which might be suitable in different scenarios. I am a bit skeptical to using a client based parameter. I think the right way forward is to create an issue for this - so we can discuss what would be the best solution.

And suggested adding timezones to router-config.json.

@abyrd provided some additional context for the issue.

OTP was originally built without much concern for timezones as it was used at first in single cities. Of couse as time went on some people started making networks that crossed time zone boundaries. But even in single time zone graphs, the client (web browser), the server, and the graph itself can all be in different time zones, and we have never gone through a design process where we spelled out the expected behavior in all these cases. Although it's tempting to just ask the client to specify an exact timezone, in other cases the client just wants to use the timezone of the location visible in the map. We should probably never default to / use the timezone of the server, but in Java it's easy for that to happen by mistake.

And (if I'm reading correctly) added additional support for at least specifying a timezone in router-config.json

I also started a discussion outside of GitHub here.

The two proposals I see are thus

  1. Allow API clients to include a timezone parameter
  2. Force a timezone in router-config.json

More can be added as they are proposed

abyrd commented 5 years ago

See also #1984

t2gran commented 5 years ago

Andrew briefly mentioned a 3rd alternative in the PR discussion:

  1. Use the time zone of the geographical location at with the event is located. This is how several Flight planning systems present their itineraries.

The clients should be able to convert and present the times in a different time zone if needed - as long as the server is working in a predictable way. Converting times to the geografical location of the event however is more difficult an belong on the server side.

All of these proposals could be implemented and work in parallell, but I would preferere that we do implement alternative 2. first - it is simple and should be easy to do, and then if someone need alternative 3. and provide the resources(money or people) to do it - we could implement that in addition to alternative 2.

I don´t like alternative 1. because the client should be able to convert the times if they need to for a given use case - we should avoid putting to much presentation logic into the server.

abyrd commented 5 years ago

Thanks, @t2gran restated this more clearly than I did. I agree that the first thing to do is use a single time zone per router, and allow forcing that time zone in router-config.json.

There are essentially two kinds of times: absolute epoch timestamps, and times rendered as hh:mm:ss in a specific time zone. Of course timestamps don't need to be qualified with time zones, but I'm thinking that when any time rendered as hh:mm:ss should always be accompanied by a time zone to assist in interpretation and final rendering (e.g. in a Javascript client). As in airline reservation systems, it would make sense to always render hh:mm:ss times in the timezone where the event happens, with some indicator when the text includes more than one time zone (like the +1 day on airline tickets).

Query side

We could allow supplying times in three different ways, which are easy to distinguish from one another: timestamps, with timezone, and without timezone. The first is just an integer, and the last two should ideally only be accepted in a single unambiguous ISO format. Only when we accept hh:mm:ss times without time zones do we have the problem of guessing the time zone. We can start out with it being the single default timezone for the graph, but should eventually move on to interpreting it as the timezone where the event occurs (i.e. the timezone at the departure point).

Response side

OTP currently responds with only epoch timestamps, not rendered times in hh:mm:ss. In a client, timestamps will be rendered for human consumption as hh:mm:ss. Times should probably always be rendered in the timezone at the location where the event occurs, with a time zone visible if the trip happens in more than one time zone. So it might be helpful for OTP to provide these time zones in its response so that code doesn't need to be in the client; on the other hand, JS libraries are available to perform the lookup, including one maintained by Conveyal employee @evansiroky (https://github.com/evansiroky/node-geo-tz)

mvanlaar commented 5 years ago

The problem that i was facing with timezones in my setup with airlines passing timezones that the duration calculation is not correct. The duration calculation is this something that the client or the server has to fix?

abyrd commented 5 years ago

It is very likely that in multi-timezone deployments, many things will not work correctly because OTP is currently not designed to work properly with more than one time zone. Some durations are calculated on the server, but many times are handled internally as absolute time-zone-less values so durations can easily be made accurate. The challenge is not the implementation (which is probably not too complicated), it's coming up with a coherent plan for handling time zones before implementation.

drewda commented 5 years ago

In case it's a useful comparison point: We had similar discussions about how to handle timezones for the Valhalla routing engine. This is even more of a concern for Valhalla, which can serve a worldwide routing graph from a single server. Ultimately, we decided to use times local to each stop location (Andrew and Thomas's option 3), using http://efele.net/maps/tz/world/. API responses include datetimes formatted in ISO 8601, including offset from UTC; for example: 2018-08-20T08:29-07:00.

abyrd commented 5 years ago

Another major problem with having a single time zone per graph: the horror that is daylight saving time (AKA summer time). All times passed into the API are interpreted in the time zone of the graph, which is the time zone of one or all of the GTFS agencies included in the graph. If the agency's time zone is set to summer time (meaning the times in the GTFS are expressed in summer time) then I believe all times passed into the OTP API will be interpreted as summer time, whether or not summer time is actually in effect on the day the search is performed, or on the day the planned travel will occur.

t2gran commented 5 years ago

The way OTP handles date and time is at best confusing. When reviewing PR #2541 I looked into how OTP parses date/time, and I will create a new Issue with my findings - it turns out that there are some bugs. Using the graphs time zone (including summer time) should not be a problem, since the date and timezone together tells if the time should be interpreted with summertime or not. The only case where this fail is when the clock is moved back one hour (The next time this will happen in Norway is between 03:00-02:00 on 28. Oktober). To avoid these kind of errors timezones should be passed on inside the request/response.

In our GraphQl API all date/times are passed on as the graphQl datatype DateTime using ISO8601 with timezone.

Nate-Wessel commented 5 years ago

I'm just going to add a case here where OTP is not performing as I would expect it to. Hopefully it adds something to the conversation.

I'm doing some analysis (Jython API) on a GTFS dataset that is in the America/Los_Angeles time zone, though the computer I'm working from is in America/Toronto, three hours away. There is nowhere that I am able to specify the timezone of the requests I'm making, so my default assumption would be that when I say "6:00AM, etc" with no timezone specified, this should be treated like 6:00AM on the west coast. After running a lot of calculations however, I've realized that this is not the case at all. All of the results I've calculated so far appear to be offset by about three hours. I did some analysis on a cloud computer with the timezone set to UTC, and the results from that were even weirder, presumably due to the greater offset. Anyway, I've just discovered that if I change the timezone in the GTFS to America/Toronto, even though it's a GTFS file from SF Muni, I get results that correspond exactly to previous results with the departure time set 3 hours earlier but with the correct time zone in the GTFS.

Without having specified a time zone in the request, I would not expect the timezone of the platform to effect the results, especially since there is only one timezone in the GTFS data.

A quick fix would be to put a warning about this behaviour somewhere in the docs.

Nate-Wessel commented 5 years ago

I guess another question is whether the above is also a problem for daylight savings time. That is, when you make a request across a DST transition (forward or backward in time), does the same time offset happen?

t2gran commented 5 years ago

@Nate-Wessel Thank you, I think the solution to your problem is to configure the default timezone for witch OTP should parse all requests. We keep this in mind when fixing this issue.

abyrd commented 4 years ago

Hi, I'm including the text of my recent post to the mailing list here:

Indeed, the scripting method @Nate-Wessel is probably calling to set the time and date uses the default timezone of the JVM. See comment here: https://github.com/opentripplanner/OpenTripPlanner/blob/v1.4.0/src/main/java/org/opentripplanner/scripting/api/OtpsRoutingRequest.java#L38

The scripting system is creating RoutingRequest objects directly, outside the context of a specific router (Graph), whereas in the trip planning API they are always created from a RoutingResource, which looks up the router/Graph and looks up its time zone before setting the date and time. See: https://github.com/opentripplanner/OpenTripPlanner/blob/v1.4.0/src/main/java/org/opentripplanner/api/common/RoutingResource.java#L449

I see in your script that you have access to the router as a Python object (where you call spt = router.plan(req)). To me it would make more sense to refactor OTP scripting such that you'd do this: router = otp.getRouter(routerId) request = router.createRequest() request.setDateTime(year, month, day, h, m, 00)

Then the OtpsRoutingRequest object would have enough context to hold onto its parent Router object, which would supply a specific Graph's time zone when setting the time. However that's a significant change. A less invasive way to get the job done would be to change OtpsRoutingRequest#setDateTime() as follows:

public void setDateTime(int year, int month, int day, int hour, int min, int sec, String timeZone) { Calendar cal = Calendar.getInstance(); cal.setTimeZone(TimeZone.getTimeZone(timeZone)); ... }

Then you could set the time zone directly on each request.

I recently did a full audit of where and how time zone information is used in OTP (via the trip planning API, not scripting), and I'll also include those conclusions here.

Although OTP generally does not have well thought-out support for multiple time zones, the behavior of the trip planning API should be sane if input GTFS is all in the same time zone, and as long as you specify a time in your query parameters.

The main idea is that each OTP graph has a single time zone, which is the time zone of the first agency encountered in its GTFS feeds. It essentially assumes there will only ever be one time zone present in all input files. All time parameters received by the API will be interpreted in that graph-wide time zone. Internal calculations and JSON API responses then use absolute "epoch" times free of time zones. GTFS feed stop_times are however interpreted in the time zone of their agency and this may be sufficient to properly handle multiple agencies in different time zones - the strangeness comes from the fact that all API interaction is happening in only one of those time zones, and it's somewhat arbitrary which zone is chosen if there is more than one.

There is a special case when no time parameter is provided to the API, only a date. In this case it will use the current time, but in the default time zone of the server itself (not the agency or the end user), like what is happening in the scripting environment. Please avoid this practice entirely, I consider it a bug and it will have unpredictable effects. Always supply a time when using the routing API.

answerquest commented 4 years ago

Proposal : Standardize everything to UTC / Epoch from the beginning - even the GTFS feeds, since they do carry the agency timezone information within them, which we can presume was baked in for such situations. OTP's response does give arrival/departure times in epoch format anyways. If you're standardizing the product then might as well standardize the factory.

For per-day timings in GTFS, an example and proposal:

At no point in this entire process was the prevalent time zone of the OTP instance's server consulted; nor was any particular time zone of gtfs schedule, origin location or destination location assumed as "the main timezone". Everything was converted down to UTC and then operated upon. The server could be running on Mars Standard Time for all we know and nobody cares (as long as a second there is as long as a second here i.e.). Exactly as per standard best practices followed in most programs out there that have to deal with multiple time zones.

Apologies: I'm not a code contributor here as of now, but did not see this proposal posted yet so felt like sharing it. In my line of work it's a rule of thumb to always convert to UTC / epoch before doing anything and reconvert to local time etc only at the very end when we have to display.

leonardehrenfried commented 4 years ago

What makes a difficult topic even harder its the use of java.util.Date to represent, well, a bunch of things, which are related but distinct:

Fortunately, Java 8 comes with the java.time package which is a vast improvement over java.util.Date which has been deprecated since Java 1.1. (Yes, 1.1 not 11!)

As a first step towards multi-TZ graphs, I suggest to use java.time instances everywhere and figure out what kind of date and time information we are actually dealing with.

For example, GTFS data always contains local times: a train always leaves at 4 o'clock and depending on the daylight savings that can mean a different offset to UTC. This could for example be represented as a LocalTime which makes it easy to convert to an Instant or OffsetDateTime (similar to timestamps) if you know the day that the train is running.

The concept of time is very tricky and java.time is a complex library, but it makes the concepts visible and understandable.

leonardehrenfried commented 4 years ago

On a more general note, I agree with @answerquest 's proposal: when calculating durations you should always use a "point in time" (OffsetDateTime/Instant) rather than a local time.

t2gran commented 4 years ago

@answerquest and @leonardehrenfried due to performance optimization we have our own internal time representations in OTP - these work correctly, there is no problem width it. What is needed is to agree on how time should be treated on the interfaces - data moving in/out of OTP. The discussion above highlights the problems and outline some solutions, so I think we have a good starting point.

github-actions[bot] commented 1 year ago

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 30 days

github-actions[bot] commented 1 year ago

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 30 days

github-actions[bot] commented 1 year ago

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 30 days

ethanpooley commented 1 year ago

My organization runs OTP instances covering the whole of the US and Canada. The single-timezone-per-instance limitation is one we have had to work around so far, and we've visited this page many times to remind ourselves of the specifics of it—also in hopes of seeing a solution on the horizon!

One good example of the problem in the US is the Amtrak GTFS feed. Amtrak runs nationwide, but in accordance with the GTFS spec the feed specifies a single timezone (America/New_York) in agency.txt. That timezone applies to all timestamps in stop_times.txt, even if they pertain to stops that are located in other timezones. The problem arises if we want to add the Amtrak GTFS feed to, say, our instance covering the southern California region. All of the other feeds for that region are specified in the America/Los_Angeles timezone. During the build process, the OTP instance would assume that Amtrak is also specified that way and thus all of its times would be wrong. Although Amtrak is practically unique in the US (there is also Greyhound/Flixbus, which specifies its feed in UTC and thus currently always requires conversion before we can pair it with feeds in any region), we worry about having to deal with this a lot more often if we extend our coverage across Europe.

We have two workarounds for this, as far as we can see. The first is to create modified versions of the Amtrak feed, converting all of its timestamps into the timezone expected by each regional instance. The second is to create a single modified version of all feeds, converting all of their timestamps into UTC. In the latter case we have to touch every feed rather than just a few, but on the other hand we only need one modified version of each, instead of up to four in the case of Amtrak. Although this conversation process would be tricky to implement, the onebusaway-gtfs-transformer-cli could help make it tolerable. We would only have to modify the timestamps in stop_times.txt; the transformer would make any necessary modifications to the service calendar.

Even if that works well and has no accuracy leaks (I worry about the various usage patterns for calendar_dates.txt), it is a heavy-handed approach. It would be much nicer to have OTP happily read in feeds using a variety of timezones, handle conversions internally as necessary, and continue responding with UNIX timestamps. We are interested to know the current thinking of the project architects and whether anything could speed such a feature along.

We have taken care to split our regions along timezone boundaries (with buffers), so as to minimize the problem as much as possible. Despite that, here are the numbers of feeds that are specified in non-standard timezones relative to our regions:

leonardehrenfried commented 1 year ago

Have you tried the very latest version? I think https://github.com/opentripplanner/OpenTripPlanner/pull/4281 implements the normalisation that you talk about.

By the way, if you're running deployments if the complexity you're talking about then you'd benefit from being a more active member of the community. We have a chat room (https://app.gitter.im/#/room/#opentripplanner_opentripplanner:gitter.im) and run video meetings twice a week.

michaelkirk commented 1 year ago

Have you tried the very latest version? I think https://github.com/opentripplanner/OpenTripPlanner/pull/4281 implements the normalisation that you talk about.

I hit a similar-but-different scenario on 2.2.0, which includes #4281. My issue was resolved when updating to a more recent build (2.3.0_2023-03-29T12-20). I just wanted to clarify for others following along that they might need something more recent than #4281.

Details if you're interested:

I had agency.txt's with different timezones: "America/Los_Angeles", "America/Tijuana", and another which did not specify a timezone at all. I was able to build a graph after specifying a build-config.json[^1], and found routes from the "America/Los_Angeles" feeds and the feeds without a timezone, but I did not find routes from the "America/Tijuana" feed included in the graph.

FWIW "America/Los_Angeles" and "America/Tijuana" actually have the exact same UTC offset at all times (including, as I understand it, when they switch to DST), but maybe that's immaterial to the implementation.

When I manually edited the "America/Tijuana" timezone to "America/Los_Angeles" things worked with 2.2.0.

After upgrading to (2.3.0_2023-03-29T12-20), things worked as I expected with the original input - a mix of "America/Tijuana" and "America/Los_Angeles" timezones — I saw routes from all feeds.

[^1] build-config.json ``` { "transitModelTimeZone": "America/Los_Angeles", "osmDefaults": { "timeZone": "America/Los_Angeles" } } ```
t2gran commented 6 months ago

@michaelkirk FYI: Support for "America/Tijuana" depend on the Java version used - we only support the time-zones that is part of the Java library. I have not tested or looked into how you can extend Java with support for other timezones, but Java is very flexible so you should be able to do so.

leonardehrenfried commented 6 months ago

Java actually does support alias zone ids but we probably not checking correctly if those two are equivalent.

t2gran commented 6 months ago

The support for "America/Tijuana" is a separate issue. So please create a new issue for it an we can discuss it there.

t2gran commented 6 months ago

Proposal for setting the internal OTP time-zone

The internal time-zone in OTP is first of all used for storing an optimized representation of time in memory. Time representation in data-sources, RT APIs and GraphQL Query APIs should set a time-zone specific to the use-case, and should be independent of the internal model. The internal-model time-zone is used for logging and debugging.

The time-zone used for the internal model in OTP should use:

  1. The graph time-zone, if graph exist. A graph can be build on top of another graph (street-graph -> transit-graph -> serve) and/or served. The time-zone must be set when building the street graph, and the same time-zone should be used in all 3 phases.
  2. The build-config timezone, if config exist and no graph exist. If a street-graph exist, then the build should abort if the time-zone does not match the graph time-zone.
  3. If not graph or build-config exist the time-zone can be set on the command-line. If a graph or build-config exist, then the build should fail if the command-line parameter is set.

There is a small inconsistency between point 2. and 3. above in the error handling. In 2. we allow the server to continue if the timezone is set and equals to the zone in the graph, while we abort in 3. This is because we should allow for the same config to be used for both building the street graph and the transit graph. We do not do this for point 3. because it is more difficult to implement and does not provide any value.

markstos commented 3 weeks ago

@t2gran Does your proposal for setting the internal OTP time-zone solve the "Amtrak Problem" described above?

To recap:

As of OTP 2.4, it seems that OTP is returning Amtrak's results as appearing 3 hours later than the should be.

As @ethanpooley suggested above, a workaround when loading Amtrak outside of the America/New_York time zone is to pre-filter their GTFS feed so that all times are shifted into the graph's time zone before it's loaded.