mapzen / metro-extracts

DEPRECATED. See readme for alternative ways to get "city-sized chunks" of OpenStreetMap data
ISC License
25 stars 27 forks source link

add timestamp for extracts #209

Open kkowalsky opened 8 years ago

kkowalsky commented 8 years ago

we should continue having our weekly extracts date on the website, I can anticipate some users wanting something more concrete than the vague "once a week"

binx commented 8 years ago

@migurski where can we get this info?

rmglennon commented 8 years ago

When we do this, also including the timestamp of the data itself should help make it clearer about whether to expect that changes you have made in OSM would be present in the download.

The old site only showed the date we created the files...which is more confusing if there was not a weekly planet file update released and yet the extracts had a new date.

migurski commented 8 years ago

I don't know that this information exists anywhere at the moment; would require info from @heffergm I think.

souperneon commented 8 years ago

@sleepylemur - I believe Grant is out. Are you able to help us find where this date would live in our system?

sleepylemur commented 8 years ago

As Rhonda mentioned, we have the time of when the last batch of extracts was finished: https://s3.amazonaws.com/metro-extracts.mapzen.com/LastUpdatedAt. It's not being done currently, but it's possible to have the planet timestamp logged as well. I'd prefer to let Grant take care of that once he gets back.

One complication is that while the extracts are being generated its hard to tell if a specific extract is this week or last week's version, but that's probably something we could just gloss over.

souperneon commented 8 years ago

That's the reason the LastUpdatedAt is enough detail because not all extracts get regenerated every week but the ones that don't get verified for any new changes (as far as i understand)

@binx @migurski can we try to show this date for starters - https://s3.amazonaws.com/metro-extracts.mapzen.com/LastUpdatedAt

migurski commented 8 years ago

Looks good, is that a good canonical URL to use, or might it be available someplace better?

sleepylemur commented 8 years ago

@migurski That url is a good one to use.

migurski commented 8 years ago

When might we expect to see that URL updated?

heffergm commented 8 years ago

That URL no longer exists now that we've switched to processing the fixed extract list as part of ODES. It's also largely irrelevant, given the fact that every object uploaded to S3 contains a timestamp. Is there a reason we're not just using those?

migurski commented 8 years ago

Our users are curious about the freshness of the data, and many of them won't know how to interpret S3 timestamps. We'd like a way to reference the point when the data came from OSM.

heffergm commented 8 years ago

That timestamp (LastUpdatedAt) never indicated when the data came from OSM. It was only intended to indicate when the data was last processed on our end.

With the current system, we process the cities.json extract list once a day, and the planet file that we use to cut the extracts is also updated daily. So generally, the extracts are cut from data that is ~24 hours old.

If there's now a requirement that we provide an OSM date relevant to the planet with each upload, I can look into doing something that will work with both types of jobs (odes and the bulk processed list).

migurski commented 8 years ago

Do we create a new planet file from a regularly-updated database? They're normally weekly when pulled from planet.openstreetmap.org. If we get stuff every day and we know this, then we can just put a "fresh daily" message on the site. If there's a chance that it may be as old as week due to cyclical planet file updates, then we should do something more sophisticated.

heffergm commented 8 years ago

In this implementation, the planet is downloaded on initial system setup (essentially from a local mirror) then updated to current with diffs (osmupdate) before being put into production. A cron job then runs daily to apply diffs to bring it up to date regularly.

migurski commented 8 years ago

So, would you say it's safe for us to say "this data is refreshed from live OSM once daily" in all cases? That should be plenty of freshness message for our visitors. Exciting that we're doing it this frequently; it used to be weekly + weekly.

heffergm commented 8 years ago

Well, we only used to cut extracts once a week, but the data was essentially as fresh as however long the extract run took, since we were pulling a planet and applying diffs as part of the process.

In any case, I think wording to the effect that the data used to create any given extract should be at most ~24 hours old is correct.

heffergm commented 8 years ago

Coincidentally, I've discovered a bug related to planet updates, so we're a bit further out of date. Resolving now, and opened https://github.com/mapzen/operations-engineering/issues/361.

migurski commented 8 years ago

K, I’m going to assign @binx on this issue, and it’s now just a front-end copy change.

souperneon commented 8 years ago

Just to clarify @heffergm @migurski The "popular" (pre-generated) extracts are also ~24 hours old? I understand the custom ones are.

heffergm commented 8 years ago

Correct.

Il Lun 22 Ago 2016, 5:09 PM Ekta Daryanani notifications@github.com ha scritto:

Just to clarify @heffergm https://github.com/heffergm @migurski https://github.com/migurski The "popular" (pre-generated) extracts are also ~24 hours old? I understand the custom ones are.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/mapzen/metro-extracts/issues/209#issuecomment-241551195, or mute the thread https://github.com/notifications/unsubscribe-auth/AAmb4aRUHyrmQlRQv-jQfKWQBxcAjf3cks5qig_xgaJpZM4JaMh4 .

  • Grant
souperneon commented 8 years ago

fantastic! also, @heffergm Italian email?

souperneon commented 8 years ago

How about "Fresh data daily!" Sounds like a news item or a baked goods store 😉

louh commented 8 years ago

Day-old data! Half off!

kkowalsky commented 8 years ago

Fresh data served daily, from server farm to data table...

migurski commented 8 years ago

That will go down in history as Ingrid’s Greatest Pun.

souperneon commented 8 years ago

I was in the room when she came up with that ;) But @kkowalsky I like that. Can we use it @migurski?

kkowalsky commented 8 years ago

@souperneon @migurski: the wording exists in the original Metro Extracts blog announcement and might have been in the old documentation...

migurski commented 8 years ago

Yes we totally should use it.