opendataphilly / opendataphilly-jkan

OpenDataPhilly powered by JKAN
https://opendataphilly.org/
MIT License
16 stars 12 forks source link

See when a dataset is updated #264

Open acouch opened 1 month ago

acouch commented 1 month ago

I'd like to be able to see or subscribe to data updates for datasets. For example, I created an visualization https://map.rco.community/ based on data from https://opendataphilly.org/datasets/registered-community-organizations-rco-boundaries/ . The data from that dataset comes from ArcGIS: https://services.arcgis.com/fLeGjb7u4uXqeF9q/arcgis/rest/services/Zoning_RCO/FeatureServer/0/query?outFields=*&where=1%3D1 which I'm not really familiar with. Is there a way to query that to see when the API has been updated?

If there is a programmatic way of seeing the data "updated" date, maybe a github action could be setup that for ArcGIS datasets that could poll the API and update the catalog with the latests "updated" date. I'd be happy to work on a PR for that if it is feasible / desirable.

rcheetham commented 1 month ago

Hi @acouch. Great to hear from you. You are highlighting a common challenge that the catalog faces. While the catalog entries support a dataset update date, it's almost never updated. The City has indicated that they are working on some kind of dashboard that will show when datasets have been updated, but I haven't seen it yet, and I don't know what capabilities it will have.

The ArcGIS feature services do have a data update field. For the dataset you listed, it would be in the editingInfo\dataLastEditDate in the JSON return at https://services.arcgis.com/fLeGjb7u4uXqeF9q/ArcGIS/rest/services/Zoning_RCO/FeatureServer/0?f=pjson

While the previous CKAN implementation had a "last updated" field, from looking at the YML at https://github.com/opendataphilly/opendataphilly-jkan/blob/main/_data/schemas/philadelphia.yml I don't think it was included in the current catalog dataset schema. So I think we'd have to add a "dataset-last-updated" field to the YML and then the GitHub Action would iterate through the dataset files and look for the ArcGIS service endpoints, and if they are there, look at the dataLastEditDate and then add or update the new field (probably insert after created).

However, before I suggest that you embark on an effort like this, I'd like to invite @BryanQuigley to comment on the feasibility and potential drawbacks, and I'd also like to wait until the City has rolled out this new dashboard in order to determine if there is a better option there.

BryanQuigley commented 1 month ago

We don't currently do any checking on the provided links at all :(.

I was thinking to use the built-in megalinter link checker, but this is certainly a good reason to consider a completely custom setup (and upstream it).

But - if you want to subscribe to a data set it makes more sense to me to just subscribe directly to it's feed, right? I don't see an obvious way to subscribe to a specific file in Github proper - there are a few third party ones, but seems to be better to just subscribe directly. Am I missing something?

acouch commented 3 weeks ago

@rcheetham

Thanks! It is helpful to be able to see the right query and field w/ Arcgis to get the updated date.

@BryanQuigley

But - if you want to subscribe to a data set it makes more sense to me to just subscribe directly to it's feed, right?

It is a common feature for catalogs, and seems within the domain of the publisher to provide. I would rather have an authoritative date that is consistent across the different datasets.

It wouldn't be that hard to create a github action that could populate that field datasets with ArcGIS as an API, however extending that to other API services or datasets without APIs could get tricky. I'd be happy to take a crack at that, however, since I'm the only one who has asked it doesn't seem like a feature that would be worth the effort and would be additional code you have to support.