iodepo / odis-in

Demonstration repository to OIH Book development
Other
0 stars 1 forks source link

Create ARGO templates #16

Open jmckenna opened 3 months ago

jmckenna commented 3 months ago

related to https://github.com/iodepo/odis-arch/issues/404

jmckenna commented 2 months ago

These are notes at this stage, and incomplete. Mark as draft until we have a complete and valid example.

thanks, changed to Draft.

bkatiemills commented 2 months ago

Can you please identify the mandatory fields in https://github.com/iodepo/odis-in/blob/master/dataGraphs/thematics/dataset/graphs/datasetTemplate.json? Many of them don't make sense for Argovis, I need a clearer picture of what's required versus optional in order to produce an MVP.

bkatiemills commented 2 months ago

Some questions on specific properties in the dataset template:

Thanks for your help on this, I think the biggest sticking point is that we're more of a data service layer than a static blob of data, which makes some of these keys an awkward fit. Once we can resolve that in a way that respects and represents Argovis' intended usage, the rest will be pretty easy.

jmckenna commented 2 months ago

@bkatiemills Just to confirm, in your messages here you are pointing to the general dataset template, but the ARGO dataset template that we created together lives at datasetTemplate-ARGO.json

jmckenna commented 2 months ago

@pbuttigieg pinging you here so that you notice @bkatiemills questions above

pbuttigieg commented 2 months ago

@jmckenna thanks for the ping

@bkatiemills

  • url: we are an API-driven search service over large datasets; we do not provide links to blobs of entire datasets. The suggestion provided in meeting of linking to our visualization frontend for Argo is inappropriate, since we will be generating jsonld for every dataset we index, and not all of them appear in the frontend. What do other similar API driven services you index do?

the 'url' property is intended for something like a landing page for the dataset or any Web resource that's dedicated to that dataset.

if you don't produce these, you can omit this property

the suggestion in the meeting was more for a Service type (rather than Dataset): there, the url would point to the service's landing page

  • keywords: please explain how these are used. Is there an existing ontology?

You can use any semantic resource you think is appropriate, or just strings.

there's some documentation here, but there are a few updates pending, summarised below.


"keywords": [ 
  "string",
  {
   "@type": "DefinedTerm",
   "inDefinedTermSet": "http://purl.org/dc/terms/DCMIType",
   "termCode": "Image",
   "name": "Image",
   "identifier": "http://purl.org/dc/dcmitype/Image"
}
]
  • distribution: more discussion needed. I am not really keen on the suggestion of listing API calls to each week of Argo data (or any other way of chunking the data); we absolutely do not want people to think of Argovis as a platform to march through and download the entire dataset, which is exactly what this enumeration will encourage. So similar question to url: do you have examples from other API services, what would you recommend?

Perhaps we should discuss this in a call.

The idea of using the Dataset type is that users can find units of data that the node (your system) wants to highlight.

in this model, I would create an individual dataset record for every chunk of Argo data you'd like others to see as an output of your service. An API call to retrieve that is a valid value for contentUrl .

if you'd prefer not to share dataset-level records, that's fine too: you can just share one Service or WebAPI record for ArgoVis. this will reduce discoverability, but - as you say - may be more appropriate to guide users to the experience you want them to have.

  • spatialCoverage: is there a jsonld for "everywhere between these two latitudes"?

I'd use the box - @jmckenna you're the FOSS4G dude, better advice? WKT or GeoJSON preferred .

.

jmckenna commented 2 months ago

@bkatiemills @pbuttigieg regarding the spatialCoverage, there are some very important points to be aware of:

"spatialCoverage": {
        "@type": "Place",
        "geo": {
            "@type": "GeoShape",
            "box": "-90 -180 90 -180"
        },
        "additionalProperty": {
            "@type": "PropertyValue",
            "propertyID": "http://dbpedia.org/resource/Spatial_reference_system",
            "value": "http://www.w3.org/2003/01/geo/wgs84_pos"
        }
},

I'm not sure if I answered your question, but keep that in mind anyway.

pbuttigieg commented 2 months ago

thanks @jmckenna - wasn't there an issue with the WGS84 link pointing explicitly to lat lon ? perhaps we should remove that suffix

bkatiemills commented 2 months ago

Thanks for your feedback here, folks - please see https://gist.github.com/bkatiemills/75efe5e9d6e67d8aa7f5add617e6591c for a schematic of where we're at. Can you read this over and make sure we're not going wildly off the rails here? Also we need input on the distribution block, it's not obvious to me what type and encodingFormat should be now that we're going for a link to an API helper rather than a static block of data.

Once this schematic is looking correct, I can write some scripts to fill in the things that need nightly updating and provide you with a URL to fetch.

jmckenna commented 1 month ago

@bkatiemills today both teams reviewed your Gist together and made some changes, below:

{
    "@context": {
        "@vocab": "https://schema.org/"
    },
    "@type": "Dataset",
    "@id": "https://registry.org/permanentUrlToThisJsonDoc",
    "name": "Argovis' representation of the Argo dataset",
    "description": "Argovis provides a representation of the profiles collected over the lifetime of the Argo program. This representation is intended to present an interpretation of Argo data that is lightly simplified from the original product, but still appropriate for a large majority of scientific and educational use cases. Simplifications include presenting delayed (better corrected and QCed) mode data where available; presenting interpolated biogeochemical data only; and merging core and bioogeochemical data collected in parallel into unified oceanic profiles.",
    "url": "https://github.com/argovis/demo_notebooks/blob/main/Intro_to_Argovis.ipynb",
    "license": "MIT", // should be more complete, the full name of the license or the link to it
    "citation": [
        "Tucker, T., D. Giglio, M. Scanderbeg, and S.S.P. Shen: Argovis: A Web Application for Fast Delivery, Visualization, and Analysis of Argo Data. J. Atmos. Oceanic Technol., 37, 401–416, https://doi.org/10.1175/JTECH-D-19-0041.1",
        "Wong, A. P. S., et al. (2020), Argo Data 1999–2019: Two Million Temperature-Salinity Profiles and Subsurface Velocity Observations From a Global Array of Profiling Floats, Frontiers in Marine Science, 7(700), doi: https://doi.org/10.3389/fmars.2020.00700",
        "Argo (2000). Argo float data and metadata from Global Data Assembly Centre (Argo GDAC). SEANOE. https://doi.org/10.17882/42182"
    ],
    "creator": "", // can be an array, with Person or Organisation types
    "version": "<timestamp to be updated on db write>",
    "keywords": [
        "Argo", 
        "ocean profiles", 
        "temperature", 
        "salinity", 
        "pressure", 
        "ocean biogeochemistry"
    ],
    "measurementTechnique": "http://www.argodatamgt.org/Documentation",
    "variableMeasured": [
        {
            "@type": "PropertyValue",
            "name": "<name from data_info[0]>",
            "url": "Perhaps a link to the ADMT docs that explain their variables?",
            "description": "<long name from data_info[2]>",
            "unitCode": "<units from data_info[2]>"
        },
        // ... to be enumerated for all variables
    ],
    "includedInDataCatalog": {
        "@type": "DataCatalog",
        "url": "https://argovis.colorado.edu/citations"
    },
    "temporalCoverage": "<min year>/<max year>"// we can consider using "to present" (is it "now"? to check how to do this) or (more accurate?) just update this to exact ISO timestamp every day
    "distribution": {
        "@type": "DataDownload", 
        "url": "https://argovis.colorado.edu/argourlhelper",
        "description": "Argovis provides no direct download of the dataset described in this record as it is too large to download in one click; however, please visit https://argovis.colorado.edu/argourlhelper to dynamically access your own subset of data"
    },
    "spatialCoverage": {
            "@type": "Place",
            "geo": {
                "@type": "GeoShape",
                "box": "-90 -180 90 180"// miny minx maxy maxx
            },
            "additionalProperty": {
                "@type": "PropertyValue",
                "propertyID": "http://dbpedia.org/resource/Spatial_reference_system",
                "value": "http://www.w3.org/2003/01/geo/wgs84_pos"
            }
    },
    "provider": [
        {
            "@type": "Organization",
            "legalName": "University of Colorado Boulder",
            "name": "Department of Atmospheric and Ocean Science",
            "url": "https://www.colorado.edu/atoc/"
        }
    ]
}
bkatiemills commented 1 month ago

@jmckenna thanks for your feedback! The gist is updated to reflect it - I have no outstanding questions here, the only remaining blanks are things to be filled in by the nightly update scripts (variables present, temporosparial extents). I'll try and find some time to implement this soon, and provide you with a URL you can scrape and tell me if the finished product is as expected.

bkatiemills commented 1 month ago

Ok team, here's a first production attempt at a blob of jsonld for the argo collection, lmkwyt: https://argovis-api.colorado.edu/summary?id=argo_jsonld&key=jsonld

jmckenna commented 1 month ago

@bkatiemills thanks, looks good. I think for the ODIS front-end search, 2 very useful parameters missing are sdPublisher (party responsible for generating the metadata, in other words similar to your existing provider section), and creditText (how to cite the dataset), see the template. @pbuttigieg thoughts?

bkatiemills commented 1 month ago

Thanks @jmckenna - do you think we can just change provider to sdPublisher? I'm not sure there's a difference or a reason on our end to have both.

How does creditText differ from citation? We could do both, but I think we'd just use the first entry from creditText as citation.

jmckenna commented 1 month ago

I recommend changing the provider text to sdPublisher (makes it easier on our front-end search code)

creditText would be the "Recommended Citation" for your dataset, whereas citation is used when you are referring to using someone else's creative work or dataset.

(in the ODIS front-end search results creditText is displayed literally as "Recommended Citation")

Yes I would just change to using only the creditText, with the first value, as you said.

jmckenna commented 1 month ago

(link to type WebAPI template discussed in today's meeting)

bkatiemills commented 1 month ago

@jmckenna sounds good, those suggestions will be reflected in tonight's update. We've also made the sitemap and cat entry as discussed; please let us know any further steps needed.

bkatiemills commented 3 weeks ago

Hi team - please let us know when our Argo record appears in your datasets list so we can confirm we hit all the requirements correctly; if there's something missing, also please let me know.

bkatiemills commented 1 week ago

Hi folks - we still don't see argovis appearing at https://oceaninfohub.org/results/Dataset?search_text=argovis&page=0 - is something wrong on our end we can address? Am I looking in the wrong place?