socrata / discuss

Discuss all the things!
https://dev.socrata.com
Apache License 2.0
10 stars 3 forks source link

[data.melbourne.vic.gov.au/qnjw-wgaj] Timezone of timestamps in "Melbourne Bike Share" #40

Open evanderkoogh opened 7 years ago

evanderkoogh commented 7 years ago

Hi,

Can anyone confirm the timezone of the timestamps in Melbourne Bike Share data? It seems to be current data (at most 15 min old), but the timestamps are 6 hours in the past.

Is it really updated every 15 min, but 6 hours old data, or is the timezone wrong for Melbourne?

Thanks!

chrismetcalf commented 7 years ago

@evanderkoogh I did some digging and it appears that dataset is actually misconfigured to use the "US/Central" (UTC - 6 hours) timezone. I've notified our CS team to contact our customer, and we'll see if we can get that corrected.

You're UTC + 11 hours, right?

The updateddate field is whatever date they specify in their update, and may be older than the actual update time on the dataset. You might want to add the internal system fields to your $select set to get the timestamps of when they were updated in our system.

chrismetcalf commented 7 years ago

@evanderkoogh The timezone has been updated to Australia/Melbourne, so you should see the timestamps in the correct timezone the next time they update that dataset.

evanderkoogh commented 7 years ago

Hey @chrismetcalf, thanks for that. I was wondering about that. It is still wrong though.. I think I have figured out exactly what the problem is.

I found out there is a 2.0 version of the dataset. And it includes an epoch timestamp, which has the correct local time, but in UTC. If that makes sense?

I can see 2 solutions:

Proper: Send through the correct time from the origination system to have the correct epoch timestamp, but in the Australia/Melbourne timezone instead of UTC.

Not as proper: Set the timezone to UTC.

But most importantly it shows we really need timezone information in the date format. ISO8601 has all the facilities for that.

chrismetcalf commented 7 years ago

Correct, for version 2.0 datasets, there are actually two different timestamp field types:

In 2.1, we only have the latter "Fixed Timestamp" version. So when your data flows from the 2.0 version to the 2.1 version through the dataset synchronization process, timestamps are converted from epoch dates to fixed timestamps offset in the Australia/Melbourne timezone. If you need to get them back to UTC, offset them by +11 hours, or add "+11:00" to the string before parsing it.

Or just use the 2.0 version and parse the epoch date. I wouldn't blame you. But do know that you'll miss out on a lot of the query features only available in 2.1.

evanderkoogh commented 7 years ago

Hey @chrismetcalf, I can work around this no problem, but there still is the issue of the wrong data.

2.0 is wrong because the epoch is the correct local time, but in UTC and you use that epoch to translate to the timezone-less ISO8601 in 2.1, but assumes it is Australia/Melbourne timezone.

Here is what happens I think:

Does that make sense?

chrismetcalf commented 7 years ago

Yep, I understand what you're thinking now. This would make it an issue with our customer's dataset update automation, so I'm going to open up an internal trouble ticket. Hopefully I can get them to come comment here, otherwise I'll try and relay updates to you.

chrismetcalf commented 7 years ago

@evanderkoogh We just heard back from our customer and they've got somebody taking a look now! He offered to get directly in touch with you, but I don't want to post his email here, so can you please email me at chris.metcalf (at) socrata.com and I'll respond with his contact info?

evanderkoogh commented 7 years ago

Done

stevage commented 7 years ago

Hi @evanderkoogh - I work on the portal at City of Melbourne. I was only just directed to this Github issue this morning (and was on leave for a couple of weeks before that). I'll have a look. I think it did used to be correct, so I'm not sure what's changed.

Feel free to email us at opendata@melbourne.vic.gov.au as well.

stevage commented 7 years ago

@chrismetcalf is there any documentation around how Socrata handles timezones (not just in the API)? The only thing I can see is the Floating Timestamp Datatype, which somewhat surprisingly says that the API always strips timezones out of datetimes - and expects the consumer of the data to infer the timezone from other sources.

The behaviour of times does not seem very well defined, and it's hard to be sure (without documentation) that we're doing anything wrong. The times used to be correct.

For now, I've updated the dataset description to point to some other APIs that are probably more suitable, including the source API which we harvest from.

evanderkoogh commented 7 years ago

@stevage Is https://data.melbourne.vic.gov.au/resource/tdvh-n9dv.json the source system you are getting the data from? In which case the problem is with them. At 4:20pm I downloaded the json and almost all entries have 1485879306 as uploaddate. Which is 4:15pm in UTC.

And I agree, storing times without timestamps is a very curious decision..

stevage commented 7 years ago

No, the real source of the data is http://www.melbournebikeshare.com.au/stationmap/data. It's a somewhat odd dataset: it's implemented by a third party vendor, managed by PTV (until recently, RACV), but mostly (or entirely?) within the City of Melbourne municipal boundary. Our copy of the data in Socrata is essentially a slightly-delayed, but possibly more convenient version of the same data. When we have historical data up, it will be more interesting - until then, I'd probably recommend going straight to the source or using citybik.es.