Open robludwig opened 6 years ago
See also #1984
Andrew briefly mentioned a 3rd alternative in the PR discussion:
The clients should be able to convert and present the times in a different time zone if needed - as long as the server is working in a predictable way. Converting times to the geografical location of the event however is more difficult an belong on the server side.
All of these proposals could be implemented and work in parallell, but I would preferere that we do implement alternative 2. first - it is simple and should be easy to do, and then if someone need alternative 3. and provide the resources(money or people) to do it - we could implement that in addition to alternative 2.
I don´t like alternative 1. because the client should be able to convert the times if they need to for a given use case - we should avoid putting to much presentation logic into the server.
Thanks, @t2gran restated this more clearly than I did. I agree that the first thing to do is use a single time zone per router, and allow forcing that time zone in router-config.json
.
There are essentially two kinds of times: absolute epoch timestamps, and times rendered as hh:mm:ss in a specific time zone. Of course timestamps don't need to be qualified with time zones, but I'm thinking that when any time rendered as hh:mm:ss should always be accompanied by a time zone to assist in interpretation and final rendering (e.g. in a Javascript client). As in airline reservation systems, it would make sense to always render hh:mm:ss times in the timezone where the event happens, with some indicator when the text includes more than one time zone (like the +1 day on airline tickets).
We could allow supplying times in three different ways, which are easy to distinguish from one another: timestamps, with timezone, and without timezone. The first is just an integer, and the last two should ideally only be accepted in a single unambiguous ISO format. Only when we accept hh:mm:ss times without time zones do we have the problem of guessing the time zone. We can start out with it being the single default timezone for the graph, but should eventually move on to interpreting it as the timezone where the event occurs (i.e. the timezone at the departure point).
OTP currently responds with only epoch timestamps, not rendered times in hh:mm:ss. In a client, timestamps will be rendered for human consumption as hh:mm:ss. Times should probably always be rendered in the timezone at the location where the event occurs, with a time zone visible if the trip happens in more than one time zone. So it might be helpful for OTP to provide these time zones in its response so that code doesn't need to be in the client; on the other hand, JS libraries are available to perform the lookup, including one maintained by Conveyal employee @evansiroky (https://github.com/evansiroky/node-geo-tz)
The problem that i was facing with timezones in my setup with airlines passing timezones that the duration calculation is not correct. The duration calculation is this something that the client or the server has to fix?
It is very likely that in multi-timezone deployments, many things will not work correctly because OTP is currently not designed to work properly with more than one time zone. Some durations are calculated on the server, but many times are handled internally as absolute time-zone-less values so durations can easily be made accurate. The challenge is not the implementation (which is probably not too complicated), it's coming up with a coherent plan for handling time zones before implementation.
In case it's a useful comparison point: We had similar discussions about how to handle timezones for the Valhalla routing engine. This is even more of a concern for Valhalla, which can serve a worldwide routing graph from a single server. Ultimately, we decided to use times local to each stop location (Andrew and Thomas's option 3), using http://efele.net/maps/tz/world/. API responses include datetimes formatted in ISO 8601, including offset from UTC; for example: 2018-08-20T08:29-07:00
.
Another major problem with having a single time zone per graph: the horror that is daylight saving time (AKA summer time). All times passed into the API are interpreted in the time zone of the graph, which is the time zone of one or all of the GTFS agencies included in the graph. If the agency's time zone is set to summer time (meaning the times in the GTFS are expressed in summer time) then I believe all times passed into the OTP API will be interpreted as summer time, whether or not summer time is actually in effect on the day the search is performed, or on the day the planned travel will occur.
The way OTP handles date and time is at best confusing. When reviewing PR #2541 I looked into how OTP parses date/time, and I will create a new Issue with my findings - it turns out that there are some bugs. Using the graphs time zone (including summer time) should not be a problem, since the date and timezone together tells if the time should be interpreted with summertime or not. The only case where this fail is when the clock is moved back one hour (The next time this will happen in Norway is between 03:00-02:00 on 28. Oktober). To avoid these kind of errors timezones should be passed on inside the request/response.
In our GraphQl API all date/times are passed on as the graphQl datatype DateTime using ISO8601 with timezone.
I'm just going to add a case here where OTP is not performing as I would expect it to. Hopefully it adds something to the conversation.
I'm doing some analysis (Jython API) on a GTFS dataset that is in the America/Los_Angeles
time zone, though the computer I'm working from is in America/Toronto
, three hours away. There is nowhere that I am able to specify the timezone of the requests I'm making, so my default assumption would be that when I say "6:00AM, etc" with no timezone specified, this should be treated like 6:00AM on the west coast. After running a lot of calculations however, I've realized that this is not the case at all. All of the results I've calculated so far appear to be offset by about three hours. I did some analysis on a cloud computer with the timezone set to UTC, and the results from that were even weirder, presumably due to the greater offset. Anyway, I've just discovered that if I change the timezone in the GTFS to America/Toronto
, even though it's a GTFS file from SF Muni, I get results that correspond exactly to previous results with the departure time set 3 hours earlier but with the correct time zone in the GTFS.
Without having specified a time zone in the request, I would not expect the timezone of the platform to effect the results, especially since there is only one timezone in the GTFS data.
A quick fix would be to put a warning about this behaviour somewhere in the docs.
I guess another question is whether the above is also a problem for daylight savings time. That is, when you make a request across a DST transition (forward or backward in time), does the same time offset happen?
@Nate-Wessel Thank you, I think the solution to your problem is to configure the default timezone for witch OTP should parse all requests. We keep this in mind when fixing this issue.
Hi, I'm including the text of my recent post to the mailing list here:
Indeed, the scripting method @Nate-Wessel is probably calling to set the time and date uses the default timezone of the JVM. See comment here: https://github.com/opentripplanner/OpenTripPlanner/blob/v1.4.0/src/main/java/org/opentripplanner/scripting/api/OtpsRoutingRequest.java#L38
The scripting system is creating RoutingRequest objects directly, outside the context of a specific router (Graph), whereas in the trip planning API they are always created from a RoutingResource, which looks up the router/Graph and looks up its time zone before setting the date and time. See: https://github.com/opentripplanner/OpenTripPlanner/blob/v1.4.0/src/main/java/org/opentripplanner/api/common/RoutingResource.java#L449
I see in your script that you have access to the router as a Python object (where you call spt = router.plan(req)). To me it would make more sense to refactor OTP scripting such that you'd do this: router = otp.getRouter(routerId) request = router.createRequest() request.setDateTime(year, month, day, h, m, 00)
Then the OtpsRoutingRequest object would have enough context to hold onto its parent Router object, which would supply a specific Graph's time zone when setting the time. However that's a significant change. A less invasive way to get the job done would be to change OtpsRoutingRequest#setDateTime() as follows:
public void setDateTime(int year, int month, int day, int hour, int min, int sec, String timeZone) { Calendar cal = Calendar.getInstance(); cal.setTimeZone(TimeZone.getTimeZone(timeZone)); ... }
Then you could set the time zone directly on each request.
I recently did a full audit of where and how time zone information is used in OTP (via the trip planning API, not scripting), and I'll also include those conclusions here.
Although OTP generally does not have well thought-out support for multiple time zones, the behavior of the trip planning API should be sane if input GTFS is all in the same time zone, and as long as you specify a time in your query parameters.
The main idea is that each OTP graph has a single time zone, which is the time zone of the first agency encountered in its GTFS feeds. It essentially assumes there will only ever be one time zone present in all input files. All time parameters received by the API will be interpreted in that graph-wide time zone. Internal calculations and JSON API responses then use absolute "epoch" times free of time zones. GTFS feed stop_times are however interpreted in the time zone of their agency and this may be sufficient to properly handle multiple agencies in different time zones - the strangeness comes from the fact that all API interaction is happening in only one of those time zones, and it's somewhat arbitrary which zone is chosen if there is more than one.
There is a special case when no time parameter is provided to the API, only a date. In this case it will use the current time, but in the default time zone of the server itself (not the agency or the end user), like what is happening in the scripting environment. Please avoid this practice entirely, I consider it a bug and it will have unpredictable effects. Always supply a time when using the routing API.
Proposal : Standardize everything to UTC / Epoch from the beginning - even the GTFS feeds, since they do carry the agency timezone information within them, which we can presume was baked in for such situations. OTP's response does give arrival/departure times in epoch format anyways. If you're standardizing the product then might as well standardize the factory.
For per-day timings in GTFS, an example and proposal:
At no point in this entire process was the prevalent time zone of the OTP instance's server consulted; nor was any particular time zone of gtfs schedule, origin location or destination location assumed as "the main timezone". Everything was converted down to UTC and then operated upon. The server could be running on Mars Standard Time for all we know and nobody cares (as long as a second there is as long as a second here i.e.). Exactly as per standard best practices followed in most programs out there that have to deal with multiple time zones.
Apologies: I'm not a code contributor here as of now, but did not see this proposal posted yet so felt like sharing it. In my line of work it's a rule of thumb to always convert to UTC / epoch before doing anything and reconvert to local time etc only at the very end when we have to display.
What makes a difficult topic even harder its the use of java.util.Date
to represent, well, a bunch of things, which are related but distinct:
Fortunately, Java 8 comes with the java.time
package which is a vast improvement over java.util.Date
which has been deprecated since Java 1.1. (Yes, 1.1 not 11!)
As a first step towards multi-TZ graphs, I suggest to use java.time
instances everywhere and figure out what kind of date and time information we are actually dealing with.
For example, GTFS data always contains local times: a train always leaves at 4 o'clock and depending on the daylight savings that can mean a different offset to UTC. This could for example be represented as a LocalTime which makes it easy to convert to an Instant
or OffsetDateTime
(similar to timestamps) if you know the day that the train is running.
The concept of time is very tricky and java.time
is a complex library, but it makes the concepts visible and understandable.
On a more general note, I agree with @answerquest 's proposal: when calculating durations you should always use a "point in time" (OffsetDateTime/Instant) rather than a local time.
@answerquest and @leonardehrenfried due to performance optimization we have our own internal time representations in OTP - these work correctly, there is no problem width it. What is needed is to agree on how time should be treated on the interfaces - data moving in/out of OTP. The discussion above highlights the problems and outline some solutions, so I think we have a good starting point.
This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 30 days
This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 30 days
This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 30 days
My organization runs OTP instances covering the whole of the US and Canada. The single-timezone-per-instance limitation is one we have had to work around so far, and we've visited this page many times to remind ourselves of the specifics of it—also in hopes of seeing a solution on the horizon!
One good example of the problem in the US is the Amtrak GTFS feed. Amtrak runs nationwide, but in accordance with the GTFS spec the feed specifies a single timezone (America/New_York
) in agency.txt
. That timezone applies to all timestamps in stop_times.txt
, even if they pertain to stops that are located in other timezones. The problem arises if we want to add the Amtrak GTFS feed to, say, our instance covering the southern California region. All of the other feeds for that region are specified in the America/Los_Angeles
timezone. During the build process, the OTP instance would assume that Amtrak is also specified that way and thus all of its times would be wrong. Although Amtrak is practically unique in the US (there is also Greyhound/Flixbus, which specifies its feed in UTC and thus currently always requires conversion before we can pair it with feeds in any region), we worry about having to deal with this a lot more often if we extend our coverage across Europe.
We have two workarounds for this, as far as we can see. The first is to create modified versions of the Amtrak feed, converting all of its timestamps into the timezone expected by each regional instance. The second is to create a single modified version of all feeds, converting all of their timestamps into UTC. In the latter case we have to touch every feed rather than just a few, but on the other hand we only need one modified version of each, instead of up to four in the case of Amtrak. Although this conversation process would be tricky to implement, the onebusaway-gtfs-transformer-cli could help make it tolerable. We would only have to modify the timestamps in stop_times.txt
; the transformer would make any necessary modifications to the service calendar.
Even if that works well and has no accuracy leaks (I worry about the various usage patterns for calendar_dates.txt
), it is a heavy-handed approach. It would be much nicer to have OTP happily read in feeds using a variety of timezones, handle conversions internally as necessary, and continue responding with UNIX timestamps. We are interested to know the current thinking of the project architects and whether anything could speed such a feature along.
We have taken care to split our regions along timezone boundaries (with buffers), so as to minimize the problem as much as possible. Despite that, here are the numbers of feeds that are specified in non-standard timezones relative to our regions:
Have you tried the very latest version? I think https://github.com/opentripplanner/OpenTripPlanner/pull/4281 implements the normalisation that you talk about.
By the way, if you're running deployments if the complexity you're talking about then you'd benefit from being a more active member of the community. We have a chat room (https://app.gitter.im/#/room/#opentripplanner_opentripplanner:gitter.im) and run video meetings twice a week.
Have you tried the very latest version? I think https://github.com/opentripplanner/OpenTripPlanner/pull/4281 implements the normalisation that you talk about.
I hit a similar-but-different scenario on 2.2.0
, which includes #4281. My issue was resolved when updating to a more recent build (2.3.0_2023-03-29T12-20
). I just wanted to clarify for others following along that they might need something more recent than #4281.
Details if you're interested:
I had agency.txt's with different timezones: "America/Los_Angeles", "America/Tijuana", and another which did not specify a timezone at all. I was able to build a graph after specifying a build-config.json[^1], and found routes from the "America/Los_Angeles" feeds and the feeds without a timezone, but I did not find routes from the "America/Tijuana" feed included in the graph.
FWIW "America/Los_Angeles" and "America/Tijuana" actually have the exact same UTC offset at all times (including, as I understand it, when they switch to DST), but maybe that's immaterial to the implementation.
When I manually edited the "America/Tijuana" timezone to "America/Los_Angeles" things worked with 2.2.0.
After upgrading to (2.3.0_2023-03-29T12-20
), things worked as I expected with the original input - a mix of "America/Tijuana" and "America/Los_Angeles" timezones — I saw routes from all feeds.
@michaelkirk FYI: Support for "America/Tijuana" depend on the Java version used - we only support the time-zones that is part of the Java library. I have not tested or looked into how you can extend Java with support for other timezones, but Java is very flexible so you should be able to do so.
Java actually does support alias zone ids but we probably not checking correctly if those two are equivalent.
The support for "America/Tijuana" is a separate issue. So please create a new issue for it an we can discuss it there.
Proposal for setting the internal OTP time-zone
The internal time-zone in OTP is first of all used for storing an optimized representation of time in memory. Time representation in data-sources, RT APIs and GraphQL Query APIs should set a time-zone specific to the use-case, and should be independent of the internal model. The internal-model time-zone is used for logging and debugging.
The time-zone used for the internal model in OTP should use:
timezone
, if config exist and no graph exist. If a street-graph exist, then the build should abort if the time-zone does not match the graph time-zone.There is a small inconsistency between point 2. and 3. above in the error handling. In 2. we allow the server to continue if the timezone is set and equals to the zone in the graph, while we abort in 3. This is because we should allow for the same config to be used for both building the street graph and the transit graph. We do not do this for point 3. because it is more difficult to implement and does not provide any value.
@t2gran Does your proposal for setting the internal OTP time-zone solve the "Amtrak Problem" described above?
To recap:
As of OTP 2.4, it seems that OTP is returning Amtrak's results as appearing 3 hours later than the should be.
As @ethanpooley suggested above, a workaround when loading Amtrak outside of the America/New_York time zone is to pre-filter their GTFS feed so that all times are shifted into the graph's time zone before it's loaded.
@markstos Partly, the "time-zone" problem in OTP is not just one problem. It is not to hard to solve, but there is a lot of code witch needs to be at least checked. We can split it in 3 categorize:
I hope we at least have fixed the internal time-zone cleanup in version 2.7 of OTP - but I can not promise any thing.
This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 30 days
Absent a better solution, here's how we are fixing Amtrak times for now, which had been off because the Amtrak feed is in the US/Eastern time zone and our graph was in US/Pacific:
We unpack Amtrak's GTFS "zip" file, open the "stop_times.txt" file, parse it as a CSV file, parse the arrival and departure times, subtract 3 hours from the arrival and departure hours, overwrite stop_times.txt and re-zip the file. Dealing with stops that occur between midnight and 3 AM is left as an exercise to the reader, as subtracting 3 from these times results in a negative hour. Rolling up the date to be a day earlier is not so simple because the corresponding dates are stored in a different file...
Here's opening time zone handling is improved in 2.7 or 2.8!
@markstos Can you confirm that #4281 doesn't work for you? What I expect to see is that all times are converted to a time zone of your choosing, for example US/Pacific. This gets you part of the way there. What you still need to implement is figuring out what zone an individual stop is in and then converting the time to the local version. OTP could also do this but currently doesn't.
The new ISO times for the queries make it easier to see what is going on.
@leonardehrenfried I can confirm that we were using OTP 2.4.0 with a build-config.json
that contained { "transitModelTimeZone": "America/Los_Angeles" }
.
In the build out, I see that echo'ed back out, like it was recognized or validated.
Yet, we still experienced the problem. It seems the Transit.Land OTP trip planning service has the issue as well.
@markstos Allow me to point out the following: none of the regular contributors are planning to work on this so it's unlikely that it will fix itself. If you want it done, consider finding a budget and hiring someone.
@leonardehrenfried Completely fair! Thanks for the clarity.
This is a collection of comments related to handling timezones within OTP. I submitted PR #2595 which spawned some discussion.
@t2gran noted that this approach was perhaps not ideal
And suggested adding timezones to
router-config.json
.@abyrd provided some additional context for the issue.
And (if I'm reading correctly) added additional support for at least specifying a timezone in
router-config.json
I also started a discussion outside of GitHub here.
The two proposals I see are thus
router-config.json
More can be added as they are proposed