seasketch / geoprocessing

Serverless geoprocessing system
https://seasketch.github.io/geoprocessing
BSD 3-Clause "New" or "Revised" License
12 stars 2 forks source link

Provide user access to age of data files in reports #125

Open twelch opened 1 year ago

twelch commented 1 year ago

Need: as a project admin/coordinator, need to know what the age of each datasources underlying files are (e.g. surveys), to know if they are out-of-date and need updating for example.

underbluewaters commented 1 year ago

The service metadata includes a .published property that I could expose in the SSN admin interface. Data sources could be published separately though, right? There is some convoluted stuff going on to expose a list of vector data sources. Maybe SSN could follow through somehow with that to check publishing times. We'd need something analogous for raster.

twelch commented 1 year ago

@underbluewaters Yes datasources are published to the datasets buckets using import:data and reimport:data separate from cdk deployments.

I like the idea of the service metadata endpoint providing the publish date information, rather than the geoprocessing functions, as long as the user could connect the dots between a long list of published datasources, and the report that they care about.

Off the top of my head, how this could be implemented: datasources.json does have a last_updated timestamp for every local datasource (not global) but it can't be guaranteed to be up to date. That said createManifest at build time could be given just enough info from this file, such that the root handler for the service endpoint, at runtime, could figure out the published timestamp for each datasource. This could potentially be cached if not fast to do.

This exposes a need for better metadata tracking for each datasource. This method above or using last_modified timestamp in datasources.json don't seem sufficient.

twelch commented 1 year ago

The workaround for this feature, is just looking up the last modified timestamp manually in the S3 admin console.

avmey commented 7 months ago

s3 maintains a Last-Modified timestamp that is returned when data is fetched, so age of datasources used in reports can be found at runtime.

The UI to start will be a small gear icon at the bottom of reports, which will have an option to view data timestamps (a JSON file structured like { ous_cultural_use: Fri, 19 Jan 2024 20:38:21 GMT, ... }).

twelch commented 7 months ago

@avmey can you share a little more about how the timestamp is accessed? Do you have to actually fetch some data to get the timestamp?

avmey commented 7 months ago

@twelch You can fetch only the HEAD to get the timestamp (without fetching the data)

const response = await fetch(url, {
      method: "HEAD",
});