Open jmckenna opened 1 year ago
@fils @pbuttigieg I wonder if in this case, my framing
script could grab the sameAs
values and harvest the individual dataset's JSON-LD, for the entire catalogue.
Or, let me know if I am misunderstanding the planned path through this.
Ah, the issue is Gleaner not currently able to get the JSON-LD.
(I wonder if a temporary script as I mention above could be used for the short-term)
So the sitemap: https://osmc.noaa.gov/erddap/sitemap.xml points to a document with 4 entries for MEOP (as an exmaple)
url>
<loc>https://osmc.noaa.gov/erddap/tabledap/MEOP_profiles.html</loc>
<lastmod>2023-09-21</lastmod>
<changefreq>monthly</changefreq>
<priority>0.5</priority>
</url>
<url>
<loc>https://osmc.noaa.gov/erddap/info/MEOP_profiles/index.html</loc>
<lastmod>2023-09-21</lastmod>
<changefreq>monthly</changefreq>
<priority>0.5</priority>
</url>
<url>
<loc>https://osmc.noaa.gov/erddap/tabledap/MEOP_profiles.graph</loc>
<lastmod>2023-09-21</lastmod>
<changefreq>monthly</changefreq>
<priority>0.5</priority>
</url>
<url>
<loc>https://osmc.noaa.gov/erddap/tabledap/MEOP_profiles.subset</loc>
<lastmod>2023-09-21</lastmod>
<changefreq>monthly</changefreq>
<priority>0.5</priority>
</url>
From Kevin, we have the following
Data: https://osmc.noaa.gov/erddap/tabledap/MEOP_profiles.html Metadata: https://osmc.noaa.gov/erddap/info/MEOP_profiles/index.html
The schema.org markup is in the "view-source" on the metadata page.
So we could start a basic index with that.
Some raw comments from our meeting today:
issue in sitemap - 1800 records, but only 27 datasets. Lots of stuff auto-popuated in it - which seem like they don't need to be: Action - create a sitemap index, a sitemap which points to other sitemaps, and then have a sitemap for datasets, one for util pages, etc, so you can direct ODIS and others to the ones you want us to harvest. Efficiency issue, but it's not a show stopper, Gleaner ignores what doesn't have JSON-LD.
includedInDataCat: "sameAs" - should be substituted with "url" property
keywords - way too many. A misunderstanding of how to use this property well. The keywords should be informative, not a general grab of lots of random things. focus on only those that are really about the dataset.
if "headline" is indeed mapping to the dataset id in ERDDAP, use "identifier" instead
use the "about" property for a focused descriptor: this should contain really the main topic of this data set
"license": being mapped from right place, but weird values. GOOS may provide a list of licenses to recommend.
For any variables not measured use an array of additionalProperty properties: so for codes, assigned status, QC flags, etc.
is the description and alternate name coming from the same place on NetCDF. Many of these are not very descriptive. A long name is not a description. A comment isn't either, but it's closer.
species (L189) in the variableMeasured block has no value - with no "value" property, we take the "we don't know the value" positon.
"conventions" (really its the syntax or format) and "axisOrDataVariable" are very GOOS-specific abstractions - this hurts cross-platform discovery, and a relatively quick fix can make these more FAIR. on Conventions, it would be best to link to the documentation of the convention.
creator - too short, need full names, sameAs--> url
@kevin-obrien I've harvested the OSMC endpoint, and I can see the 27 records in a development instance of the front-end search for ODIS (see screencaptures below).
We currently have an issue displaying the bounding boxes on a map, related to the large spatial extents of these records though (temporarily I was able to view them by tweaking the mapping code, but, an entire rewrite of that mapping code is being done now...). Question: are you ok if I publish these 27 records to the live search (oceaninfohub.org), knowing that they won't be displayed in the "Spatial Search", yet?
@kevin-obrien the OSMC records are now visible on the ODIS live search (!). Give it a try, here is a direct link just to those dataset records: https://oceaninfohub.org/results/Dataset?page=0&facet_query=facetType%3Dtxt_provider%26facetName%3DObserving%2BSystem%2BMonitoring%2BCenter%2B%2528OSMC%2529
@kevin-obrien it would be good if you can also create an entry in the ODIS Catalogue for this OSMC endpoint:
Startpoint URL for ODIS-Arch
(this is the url to your sitemap.xml file)
Type of the ODIS-Arch URL
(select "Sitemap")
view-source
), listing type:DataCatalog
and type:Dataset
sameAs
property is used to point to the individual dataset's JSON-LDTo-Do
@fils @pbuttigieg my findings differ from our earlier discussions (I reviewed how the NMDIS-China partner setup their ERDDAP endpoint, and it seems to match the NOAA endpoint). Am I misunderstanding the desired steps here? Please explain.
Paste of the JSON-LD that is embedded above