internetofwater / geoconnex.us

URI registry for https://geoconnex.us based URIs
Other
23 stars 14 forks source link

Guidance Needed for Handling Datasets in Geoconnex #196

Open ksonda opened 11 months ago

ksonda commented 11 months ago

There is an emerging use case, most prominently CUAHSI but also people running CKAN or Socrata based CMS/DMS and perhaps things like Sciencebase.gov, of datasets that can be tagged as being schema:about geoconnex features but are not themselves monitoring locations and thus do not have geoconnex PIDs.

A grey area would be datasets that for example, timeseries about an organizational monitoring location that could probably be a reference location if it is not already (eg. USBR RISE data catalog items for reservoir operations data, MonitorMyWatershed stuff).

We need to develop guidance on:

1) How to submit data urls that are not PIDs so that we can still crawl them 2) When to submit PIDs vs when to submit data URLs 3) When to submit PIDs vs when to submit PIDs and Reference PIDs to the appropriate reference repository

ksonda commented 11 months ago

an undoubtedly non-exhaustive list of options

Option 1: (I have been telling people to do this until now). Set up organizational monitoring location pages whether they are reference features or not, and set them up in such a way that they serve as sub-data catalogs for all datasets about them. These should have geoconnex PIDs.

eg.

{"@id":"https://geoconnex.us/usgs/monitoring-location/{numbers}", 
"schema:subjectOf":
["stuff about page/API call for to parametercode 1",
"stuff about page/API call/ data download for data for parametercode 2"]}

Option 2: Tag their dataset to be schema:about or some HY relationship to a geoconnex feature, whether a reference location or just some kind of featureofinterest like a mainstem or cataloging feature like a HUC. No geoconnex PID, so must give us a list of dataset URLs to crawl.

{"@id":"a non permanent dataset URL", 
"schema:about":
["https://geoconnex.us/ref/gages/1000001", "https://geoconnex.us/ref/hu10/0102030405"]}

It's probably a wash between options 1 and 2 in terms of effort for the data provider. In terms of our own data management, it adds a layer of complexity to administer, and possibly an order of magnitude greater crawling compute to do Option 2, but is more consistent with the SELFIE architecture I suppose. However, it does open us up to link rot,

Option 3: Allow both options, but require some sort of geoconnex PID for datasets. Like an additional special organizational sub-namespace. eg

id: https://geoconnex.us/usgs/datasets/{datasetid} target: https://waterdata.usgs.gov/monitoring-location/03451200/#parameterCode=00010&period=P7D description: parametercode 00010 for https://geoconnex.us/usgs/monitoring-location/03451200

dblodgett-usgs commented 11 months ago

By my read, you are including few specific use cases here and it would be worth while to break them apart a bit more for the sake of clear recommendations.

I want to avoid geoconnex IDs for datasets and other abstract digital objects (reports). Those should really have DOIs.

The monitoring context adds a dimension to the use cases that needs to be split out -- so I'll just avoid that nuance in the rest of my response.

So there are two patterns we might have.

1) a data provider's resource that is an "in band" semantic resource that is either about or subject of a geoconnex feature. 2) a data providers resource that is an "out of band" object that is either about or subject of a geoconnex feature.

If the data provider is decorating their resource, and it's a semantic resource, I think having the structured data be about its self makes the most sense.

e.g.

{
    "@id": "a url that returns semantic content",
    "schema:about": ["https://geoconnex.us/ref/hu10/0102030405"]
}

If the data provider is decorating their resource and it's a non-semantic resource, we should use the other structure.

e.g.

{
    "@id": "https://geoconnex.us/...",
    "schema:subjectOf": {
        "schema:url": "digital object that is out of band"
    }
}
ksonda commented 11 months ago

Discussed with @webb-ben in the midst of implementing #198 and #202 . We can support adding arbitrary, not just geoconnex.us sitemaps by adding them to the namespaces subdirectories. Then we just need to figure out and document guidance.

Some cases here

image