iodepo / odis-arch

Development of the Ocean Data and Information System (ODIS) architecture
https://book.oceaninfohub.org/
26 stars 16 forks source link

Connect initial GOOS ERDDAP endpoint with ODIS #355

Open jmckenna opened 8 months ago

jmckenna commented 8 months ago

To-Do

@fils @pbuttigieg my findings differ from our earlier discussions (I reviewed how the NMDIS-China partner setup their ERDDAP endpoint, and it seems to match the NOAA endpoint). Am I misunderstanding the desired steps here? Please explain.

Paste of the JSON-LD that is embedded above

{
  "@context": "http://schema.org",
  "@type": "DataCatalog",
  "name": "ERDDAP Data Server at OSMC",
  "url": "https://osmc.noaa.gov/erddap",
  "publisher": {
    "@type": "Organization",
    "name": "OSMC",
    "address": {
      "@type": "PostalAddress",
      "addressCountry": "USA",
      "addressLocality": "7600 Sand Point Way NE, Seattle",
      "addressRegion": "WA",
      "postalCode": "98115"
    },
    "telephone": "+1 206-555-1212",
    "email": "kevin.m.obrien@noaa.gov",
    "sameAs": "http://www.osmc.noaa.gov"
  },
  "fileFormat": [
    "application/geo+json",
    "application/json",
    "text/csv"
  ],
  "isAccessibleForFree": "True",
  "dataset": [
    {
      "@type": "Dataset",
      "name": "CCHDO GO SHIP bottle data",
      "sameAs": "https://osmc.noaa.gov/erddap/info/cchdo_bottle/index.html"
    },
    {
      "@type": "Dataset",
      "name": "CCHDO GO SHIP ctd data",
      "sameAs": "https://osmc.noaa.gov/erddap/info/cchdo_ctd/index.html"
    },
    {
      "@type": "Dataset",
      "name": "Global Drifter Program - 1 Hour Interpolated QC Drifter Data",
      "sameAs": "https://osmc.noaa.gov/erddap/info/drifter_hourly_qc/index.html"
    },
    {
      "@type": "Dataset",
      "name": "Global Drifter Program - 6 Hour Interpolated QC Drifter Data",
      "sameAs": "https://osmc.noaa.gov/erddap/info/drifter_6hour_qc/index.html"
    },
    {
      "@type": "Dataset",
      "name": "IOOS GTS counts",
      "sameAs": "https://osmc.noaa.gov/erddap/info/ioos_obs_counts/index.html"
    },
    {
      "@type": "Dataset",
      "name": "JASL/UHSLC Research Quality Tide Gauge Data (daily)",
      "sameAs": "https://osmc.noaa.gov/erddap/info/global_daily_rqds/index.html"
    },
    {
      "@type": "Dataset",
      "name": "JASL/UHSLC Research Quality Tide Gauge Data (hourly)",
      "sameAs": "https://osmc.noaa.gov/erddap/info/global_hourly_rqds/index.html"
    },
    {
      "@type": "Dataset",
      "name": "JCOMMPS Active WMO ID LIST",
      "sameAs": "https://osmc.noaa.gov/erddap/info/wmo_list/index.html"
    },
    {
      "@type": "Dataset",
      "name": "meop animal profiles",
      "sameAs": "https://osmc.noaa.gov/erddap/info/MEOP_profiles/index.html"
    },
    {
      "@type": "Dataset",
      "name": "OSMC 90 day RT data",
      "sameAs": "https://osmc.noaa.gov/erddap/info/OSMC_30day/index.html"
    },
    {
      "@type": "Dataset",
      "name": "OSMC Argo Profile data",
      "sameAs": "https://osmc.noaa.gov/erddap/info/OSMC_PROFILERS/index.html"
    },
    {
      "@type": "Dataset",
      "name": "OSMC flattened observations from GTS",
      "sameAs": "https://osmc.noaa.gov/erddap/info/OSMC_flattened/index.html"
    },
    {
      "@type": "Dataset",
      "name": "OSMC flattened observations from GTS",
      "sameAs": "https://osmc.noaa.gov/erddap/info/osmc_test/index.html"
    },
    {
      "@type": "Dataset",
      "name": "OSMC normalized observations from GTS",
      "sameAs": "https://osmc.noaa.gov/erddap/info/OSMC_Points/index.html"
    },
    {
      "@type": "Dataset",
      "name": "OSMC Profiles",
      "sameAs": "https://osmc.noaa.gov/erddap/info/OSMCV4_DUO_PROFILES/index.html"
    },
    {
      "@type": "Dataset",
      "name": "OSMC surface trajectory data",
      "sameAs": "https://osmc.noaa.gov/erddap/info/OSMCV4_DUO_SURFACE_TRAJECTORY/index.html"
    },
    {
      "@type": "Dataset",
      "name": "OSMC TimeSeries data",
      "sameAs": "https://osmc.noaa.gov/erddap/info/OSMCV4_DUO_TIME_SERIES/index.html"
    },
    {
      "@type": "Dataset",
      "name": "TAO/TRITON, RAMA, and PIRATA Buoys, Daily, 1977-present, Air Temperature",
      "sameAs": "https://osmc.noaa.gov/erddap/info/pmelTaoDyAirt/index.html"
    },
    {
      "@type": "Dataset",
      "name": "TAO/TRITON, RAMA, and PIRATA Buoys, Daily, 1977-present, Sea Surface Temperature",
      "sameAs": "https://osmc.noaa.gov/erddap/info/pmelTaoDySst/index.html"
    },
    {
      "@type": "Dataset",
      "name": "TAO/TRITON, RAMA, and PIRATA Buoys, Daily, 1977-present, Temperature",
      "sameAs": "https://osmc.noaa.gov/erddap/info/pmelTaoDyT/index.html"
    },
    {
      "@type": "Dataset",
      "name": "TAO/TRITON, RAMA, and PIRATA Buoys, Daily, 1977-present, Wind",
      "sameAs": "https://osmc.noaa.gov/erddap/info/pmelTaoDyW/index.html"
    },
    {
      "@type": "Dataset",
      "name": "TAO/TRITON, RAMA, and PIRATA Buoys, Daily, 1987-present, Salinity",
      "sameAs": "https://osmc.noaa.gov/erddap/info/pmelTaoDyS/index.html"
    },
    {
      "@type": "Dataset",
      "name": "TAO/TRITON, RAMA, and PIRATA Buoys, Daily, 1988-2020, ADCP",
      "sameAs": "https://osmc.noaa.gov/erddap/info/pmelTaoDyAdcp/index.html"
    },
    {
      "@type": "Dataset",
      "name": "TAO/TRITON, RAMA, and PIRATA Buoys, Daily, 1989-present, Wind Stress",
      "sameAs": "https://osmc.noaa.gov/erddap/info/pmelTaoDyTau/index.html"
    },
    {
      "@type": "Dataset",
      "name": "TAO/TRITON, RAMA, and PIRATA Buoys, Daily, 1992-present, Sea Surface Salinity",
      "sameAs": "https://osmc.noaa.gov/erddap/info/pmelTaoDySss/index.html"
    },
    {
      "@type": "Dataset",
      "name": "TAO/TRITON, RAMA, and PIRATA Buoys, Daily, 1997-present, Precipitation",
      "sameAs": "https://osmc.noaa.gov/erddap/info/pmelTaoDyRain/index.html"
    }
  ]
}
jmckenna commented 8 months ago

@fils @pbuttigieg I wonder if in this case, my framing script could grab the sameAs values and harvest the individual dataset's JSON-LD, for the entire catalogue.

Or, let me know if I am misunderstanding the planned path through this.

jmckenna commented 8 months ago

Ah, the issue is Gleaner not currently able to get the JSON-LD.

(I wonder if a temporary script as I mention above could be used for the short-term)

fils commented 8 months ago

So the sitemap: https://osmc.noaa.gov/erddap/sitemap.xml points to a document with 4 entries for MEOP (as an exmaple)

url>
<loc>https://osmc.noaa.gov/erddap/tabledap/MEOP_profiles.html</loc>
<lastmod>2023-09-21</lastmod>
<changefreq>monthly</changefreq>
<priority>0.5</priority>
</url>

<url>
<loc>https://osmc.noaa.gov/erddap/info/MEOP_profiles/index.html</loc>
<lastmod>2023-09-21</lastmod>
<changefreq>monthly</changefreq>
<priority>0.5</priority>
</url>

<url>
<loc>https://osmc.noaa.gov/erddap/tabledap/MEOP_profiles.graph</loc>
<lastmod>2023-09-21</lastmod>
<changefreq>monthly</changefreq>
<priority>0.5</priority>
</url>

<url>
<loc>https://osmc.noaa.gov/erddap/tabledap/MEOP_profiles.subset</loc>
<lastmod>2023-09-21</lastmod>
<changefreq>monthly</changefreq>
<priority>0.5</priority>
</url>

From Kevin, we have the following

Data: https://osmc.noaa.gov/erddap/tabledap/MEOP_profiles.html Metadata: https://osmc.noaa.gov/erddap/info/MEOP_profiles/index.html

The schema.org markup is in the "view-source" on the metadata page.

So we could start a basic index with that.

jmckenna commented 8 months ago
pbuttigieg commented 7 months ago

Some raw comments from our meeting today:

issue in sitemap - 1800 records, but only 27 datasets. Lots of stuff auto-popuated in it - which seem like they don't need to be: Action - create a sitemap index, a sitemap which points to other sitemaps, and then have a sitemap for datasets, one for util pages, etc, so you can direct ODIS and others to the ones you want us to harvest. Efficiency issue, but it's not a show stopper, Gleaner ignores what doesn't have JSON-LD.

includedInDataCat: "sameAs" - should be substituted with "url" property

keywords - way too many. A misunderstanding of how to use this property well. The keywords should be informative, not a general grab of lots of random things. focus on only those that are really about the dataset.

if "headline" is indeed mapping to the dataset id in ERDDAP, use "identifier" instead

use the "about" property for a focused descriptor: this should contain really the main topic of this data set

"license": being mapped from right place, but weird values. GOOS may provide a list of licenses to recommend.

For any variables not measured use an array of additionalProperty properties: so for codes, assigned status, QC flags, etc.

is the description and alternate name coming from the same place on NetCDF. Many of these are not very descriptive. A long name is not a description. A comment isn't either, but it's closer.

species (L189) in the variableMeasured block has no value - with no "value" property, we take the "we don't know the value" positon.

"conventions" (really its the syntax or format) and "axisOrDataVariable" are very GOOS-specific abstractions - this hurts cross-platform discovery, and a relatively quick fix can make these more FAIR. on Conventions, it would be best to link to the documentation of the convention.

creator - too short, need full names, sameAs--> url

jmckenna commented 6 months ago

@kevin-obrien I've harvested the OSMC endpoint, and I can see the 27 records in a development instance of the front-end search for ODIS (see screencaptures below).

We currently have an issue displaying the bounding boxes on a map, related to the large spatial extents of these records though (temporarily I was able to view them by tweaking the mapping code, but, an entire rewrite of that mapping code is being done now...). Question: are you ok if I publish these 27 records to the live search (oceaninfohub.org), knowing that they won't be displayed in the "Spatial Search", yet?

Screenshot 2023-12-07 075638 Screenshot 2023-12-07 075730

Screenshot 2023-12-07 075806

jmckenna commented 2 months ago

@kevin-obrien the OSMC records are now visible on the ODIS live search (!). Give it a try, here is a direct link just to those dataset records: https://oceaninfohub.org/results/Dataset?page=0&facet_query=facetType%3Dtxt_provider%26facetName%3DObserving%2BSystem%2BMonitoring%2BCenter%2B%2528OSMC%2529

jmckenna commented 2 months ago

@kevin-obrien it would be good if you can also create an entry in the ODIS Catalogue for this OSMC endpoint: