develop guidance to specify location of sitemap or similar when submitting regex pids

internetofwater / geoconnex.us

URI registry for https://geoconnex.us based URIs

Other

23 stars 14 forks source link

develop guidance to specify location of sitemap or similar when submitting regex pids #117

Closed ksonda closed 2 years ago

ksonda commented 2 years ago

As a regex PID submitter, I want to specify a location where geoconnex can find a list of all of my PIDs, so that they show up in sitemaps (pids.geoconnex.us#7) for harvesting into the knowledge graph.

We could have some options:

an actual sitemap.xml
an API endpoint or file store URL and field specifying the field that is the uri (similar to NLDI, perhaps the NLDI endpoint itself, see geoconnex.us#113)

dblodgett-usgs commented 2 years ago

I guess I would propose adding a:

*.xml file that contains sitemap entries.
*.csv file and id field that contains possible urls
*json file and id field that contains possible urls
*jsonld file and id field that contains possible urls

This could be housed in the current csv file we use for regex as two new columns. I'll mock something up for HU10 to see what you think @ksonda

dblodgett-usgs commented 2 years ago

Thoughts on what I have in the above there, @konda?

ksonda commented 2 years ago

@webb-ben I THINK this is fine. The way the PID DB is loaded shouldnt care about these new columns right?

ksonda commented 2 years ago

Actually I think for jsonld we just presume that @id is what we're putting in the sitemap? Idk if "field" makes sense in a json-ld document generally.

dblodgett-usgs commented 2 years ago

Yeah... I had the same thought. That field could just be ignored if @id of the features is clear?

webb-ben commented 2 years ago

I actually have been working on this today. Because of the modifications we made to jsonld at the collections level it is really easy to harvest all of the pids. I actually have a folder where I have my workflow for generating the sitemap. Where should I commit that? Ironically I have been harvesting them like curl "https://reference.geoconnex.us/collections/gages/items?f=jsonld&limit=186495" > namespaces/gages.json and then I transform that into a csv.

ksonda commented 2 years ago

eh, yeah thats why @dblodgett-usgs suggested it, because we implemented json-ld as an itemList, it's very efficient . But JSON-LD documents are super diverse so we can't count on this. I think we want to move to just xml, csv and (geo)json as the only options

ksonda commented 2 years ago

addressed in #131