Open mojodna opened 3 months ago
Hi @mojodna!
I'm trying to obtain the OSM planet ORC files for years 2018-2024 from the s3://osm-pds/
bucket, but I only see the 2024/
folder when I do aws s3 ls --no-sign-request s3://osm-pds/
.
Is this issue related to that? Thanks!
Only slightly. I imagine that we'd continue to keep archival ORC (or Parquet) files for some time period. Right now, previous versions are deleted after 90 days for 2 reasons:
history
ORC file can be used to derive snapshots for any point in the past. I don't the SQL off-hand to do this, but if you need it (or are just interested), I can resurrect it.It would be of great help if I could take a look at that SQL to derive the snapshots using the planet-history
, if it is not much of problem for you.
Let me know if there's anything I can to help. And thanks again!
This is how I derive snapshots from the history table:
WITH with_valid_until AS (
SELECT *,
-- derive a valid_until column based on the timestamp of the next update;
-- it will be null if it's current
LEAD(timestamp, 1) OVER (
PARTITION BY type,
id
ORDER BY version ASC
) AS valid_until
FROM planet_history
)
SELECT *
FROM with_valid_until
WHERE TIMESTAMP '2009-02-28 12:00:00' -- snapshot timestamp
BETWEEN timestamp AND valid_until
Thank you very much, this is very helpful!
These are (now) already in S3 (managed by OSMF), but in a different region (eu-central-1, us-west-2) vs.
osm-pds
(us-east-1). As a result, I think it makes sense for this project to focus on cloud-oriented derivatives rather than being a cloud-based mirror.This will require documentation (in RODA, here, and within the S3 bucket itself) and a gradual ramp-down of available files to minimize impact. To track/facilitate this change, we'll want log queries (+ dashboards?) showing the number of requests to historical PBFs (vs. whatever's latest).
We may also be able to work with the AWS Open Data team to reach out to customers using this bucket and point them to the OSMF-maintained one.