Open pbuttigieg opened 1 year ago
@justinbuck
test!
Thanks @justinbuck and @Naisunev we can start listing particulars on which twins you're bringing in, where their endpoints are, and brainstorming what we'd like to exchange.
We can schedule a meeting with @fils and @jmckenna to set up a workflow to link your catalogues via ODIS, as the first step towards exchanging data, containers, code, or other digital assets
@justinbuck @Naisunev
Shall we start with defining metadata for the core modules of DTs ?
These would likely include:
Great! working out how we can do working mode? Also: great to use ISO 23247. I think this will greatly help breakdown of functionality. If needed I have some images.
Hmm. Is that the modules themselves or the data from it? I'm in :-) Had to retrieve my password.
Hi, OGC here and Iliad interoperability support. Afaik 23247 is good start, but images/schema could be helpful at least here as we're naturally more fluent in ISO 19xxx suite then 23xxx. We're quite focused on these:
- data ingest and ETL / integration
- observing / sensing data streams
- data export both on the catalog (meta)data and data itself.
plus for:
- the "what if" modules (hypothetical twin of a twin) we use Processing Services and Application Package in Iliad, stack is supported already by some DIASes and Agencies like ESA and DLR.
not sure about:
- model stack and I/O
- virtualisation / visualisation as this is more infrastructure area, we use docker on default.
is the plan @UteBroenner to propose your pilot as the guinea pig?
@pzaborowski @pbuttigieg hi, I have had some more discussions with Rob regarding sync API and how we can work this into OGC (https://github.com/mimiro-io/ocean-open-data-sync-protocol/blob/master/specification.md) and also working with HUB ocean to set up some example integrations. How can we align these things best?
Sure, we can do that! Happy to be the guinea pig. UteSent from mobile device, please excuse brevity. Am 02.03.2023 um 22:40 schrieb Piotr Zaborowski @.***>: Hi, OGC here and Iliad interoperability support. Afaik 23247 is good start, but images/schema could be helpful at least here as we're naturally more fluent in ISO 19xxx suite then 23xxx. We're quite focused on these:
data ingest and ETL / integration observing / sensing data streams data export both on the catalog (meta)data and data itself.
plus for:
the "what if" modules (hypothetical twin of a twin) we use Processing Services and Application Package in Iliad, stack is supported already by some DIASes and Agencies like ESA and DLR.
not sure about:
model stack and I/O virtualisation / visualisation as this is more infrastructure area, we use docker on default.
is the plan @UteBroenner to propose your pilot as the guinea pig?
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: @.***>
@gra-moore I'm not leaing this, but would suggest - if herein scenario includes the synchronization case - to refer to your work. Then if needed focus on testing, profiling, season integration API and in the original repo eventually generalise changes if we standardise it alone. WDYT?
Thanks all for the input - note the iFDO for DTs issue linked above too.
The path ahead is to see how to splice the specifications noted above into JSON-LD / schema.org exchange packets and get some twins talking to each other and generically visible via ODIS
@UteBroenner the patterns can cover (meta)data, which includes descriptions of software modules
@pbuttigieg @Naisunev @pzaborowski @gra-moore @justinbuck I would like to revive this towards something demonstrable for the DITTO Summit in November. In order to contribute I would need some concrete to work on. Should we arrange for a short meeting or discuss asynchronously here?
@UteBroenner if we could have a quick meeting to sync up that would be great.
@gra-moore we are meeting tomorrow at 11, alternatively at 9 for half an hour. Would you send me your email so I can invite you to the meeting?
@UteBroenner I have sent you an email with my contact details. I am available at 11.
Great meeting today: Document for dumping current ideas & developments at https://docs.google.com/document/d/1fKa3A5g82Y6OICBg0lHtAyeyJI-GcWdBV8BVdCd1bSY/edit#
Not that we have a dedicated repo under DITTO now: https://github.com/DITTO-OceanDecade/turtle
Not that we have a dedicated repo under DITTO now: https://github.com/DITTO-OceanDecade/turtle
thanks @UteBroenner I am now following that repo
Following yesterday's meeting, we'll be drafting a meta-pattern that should allow federated digital twins to leverage ODIS (and other ODIS-like systems, using schema.org and JSON-LD) to query and explore each other's asset catalogues.
This work also applies to infrastructures preparing for digital twin interoperability, such as EDITO-Infra
Digital twins are - by definition - concerned with tracking a real-world entity, meshing sensed/observed data streams with modelling outputs to create a dense enough representation to power virtualisation engines that feed user experiences.
When a "what-if" scenario is triggered, this initiates another event cascade that runs counter-current to that described above. As we're not tracking the real-world entity anymore, what is actually happening is the instantiation of a digital twin for a hypothetical/not-really-real entity, determined by user settings/inputs. This user input event triggers the spoofing of (some of) the data from the observation/sensing and/or modelling stack.
As such, the ODIS elements we can use to help the federated twins describe their cascades are:
schema:Event
- alongside the sub/super-event propeties should be used to describe the sensing events, as well as any computational / analysis / modelling events that result from it.schema:Action
- and the potentialAction
type should be used closely with Event, to describe the finer-grained actions that happen in a twin's digital ecosystem (e.g. ingest, model runs, visualisation,...). schema:Dataset
- Naturally, datasets will be transmitted between each Event and Action: these are not just data from sensors, they can also include things like paramters, settings, instructures, etc. Software code is a special case, and has its own type. The schema:Datacatalog
type may be useful here too.schema:SoftwareApplication
- This type will be key to describe the software modules that each twin has. Each application (containerised or otherwise) should be described with this type. This way, digital twins can see what other twins have. An Event or Action can trigger a software application as an agent, and this can generate a Dataset or SoftwareCode as an output, that is the input for the next Event/Action etc. schema:Service
- and its WebService
sub-type can be used to describe processes that are running (based on the execution of a SoftwareApplication). This type should be used to catalog what services a given twin is running or capable of running (metadata in the type can articulate that).schema:HowTo
- We haven't really used this type much in ODIS, but this may be what we need to provide the skeleton for workflow description, which can be instantiated by the execution of an Action or Event.There are certainly more, but this collection will get us quite far in creating graphs that describe how any given twin handles the cascades described above. These graphs - linking the elements above - are also likely to map quite closely to the CWL and other workflow representations in use, providing a generic interoperability layer which can leverage ODIS to interface with the holdings of the broader federation.
The ODIS team will prep these types in our documentation (some are already there), but TURTLE members can already start experimenting.
Our next meeting is at the end of February, where we can review the specifications and approach.
@justinbuck - this is a general summary of the meeting, I think BODC's ODIS set up could be the place to start with examples, as you already created a flow from Events to Datasets.
It's fine to use a few concepts from schema.org as high level classifiers, but architecturally you will need to use relevant specific domain models for interoperable content. It seems short-sighted not to plan for this now, and treat schema.org as simply one of many such classification layers that sit over machine actionable descriptions.
Which domain resources do you have in mind ? The high-level stuff gets us quite close to component exchange
It seems short-sighted not to plan for this now, and treat schema.org as simply one of many such classification layers that sit over machine actionable descriptions.
We have planned for this. As discussed in our TURTLE meeting (I believe you were there), nesting other descriptions with greater expressivity and domain-relevance within schema.org is easily done. The generic schema.org shell, supports broad discovery, while nested content can be created as each twin wishes based on their target audiences and internal needs. See: https://doi.org/10.5281/zenodo.7682399
Following on from https://github.com/iodepo/odis-arch/issues/162#issuecomment-1923674608
All reference patterns for JSON-LD/schema.org noted above are here: https://github.com/iodepo/odis-in/tree/master/dataGraphs/thematics
We'll build documentation around those in the ODIS Book after some more testing.
Direct links:
You'll notice that the value space of many properties is described like:
"audience": {"@type": "https://schema.org/Audience"},
That just means that one should refer to the noted schema.org type specification(s) for guidance on how to build that stanza. For example
"audience": {
"@type": "Audience",
"audienceType": "Technical experts associated with digital twin interoperability efforts",
"geographicArea": {
"@type": "AdministrativeArea",
"name": "global"
},
"description": "Members of the the 'Interoperability Architecture for a Digital Ocean' (TURTLE) project. TURTLE's goal is to coordinate ongoing international Digital Twins of the Ocean projects and work towards an interoperability architecture. As initiatives around the globe begin to enhance ocean-oriented digital capacity, there are unprecedented opportunities to power digital twinning."
}
One can use more properties, or get more detailed using @Place
types with geospatial coordinates, of course.
Examples of most of the types above are available from the ODIS Federation partners (with varying levels of completeness), and discoverable through https://oceaninfohub.org/. For example, this dataset JSON record from naturalscience.be or these time series EventSeries from the METS RCN project.
Hello @pbuttigieg , sorry for my late input! As promise, below are the links to the EDITO catalogs.
Links: Data catalog (STAC), GUI is here. The catalog is variable-oriented. We use STAC collections for variables, and STAC items for any piece of homogeneous data over a given variable. STAC assets points to either actual data files or API endpoints to retrieve the data. STAC catalogs are used to present alternative views for the users to discover and browse the data. Currently, we use the Resto engine, but we can implement the STAC browser mapping to schema.org for example.
EDITO has two kind of services: the premanent service that run the platform (a catalog engine, a viewer, a tutorial platform, etc.) and services that the users can launch on their names to build DTO, explore data with their tools, etc.
We could expose the former as Service
or WebService
as you proposed, and the catalog for the latest as Software Application
.
Here is the link to all the service a user can launch in its name on the platform: Service catalogs, GUI is here.
Please note there are some private catalogs as well (the link above only show the public ones).
However, once a user launch service instances for its usage, the API to get the info are authenticated. I am not sure there is a point to have JSON-LD in private endpoints anyway, right?
EDITO Processes API are not publicly available yet.
There is catalog for all available processes and API for running processes. Like the services, we could implement SoftwareApplication
for the catalog, but running processes are authenticated.
However, processes will have two kind of API endpoints: one similar to services above, and one OGC API Processes.
In any cases, I am sure we can implement something.
EDITO host a open and collaborative tutorial platform. Does it fit to the HowTo
scheme?
We aim at implement a CWL API liking metada and processes, but nothing to show yet.
Questions on our side: it is not clear to use how you want to distinguish the uses of Event
and Actions
. Giving your Actions
example, it seems that it can contain our running processes.
Related question, is there any point in referencing ephemeral info? (In our case, user service and process runs)
Looking forward to continue this work!
Re: https://github.com/iodepo/odis-arch/issues/162#issuecomment-1971305512
Thanks @qgau
Questions on our side: it is not clear to use how you want to distinguish the uses of Event and Actions. Giving your Actions example, it seems that it can contain our running processes.
It is a bit of a(n intentionally) fuzzy distinction, but Events tend to be larger scale things (like concerts and festivals in the schema examples), while Actions are smaller-scale "things that were done".
In our world, an Event would be like a research expedition, or a typhoon. Corresponding Actions would be like the deployment of an Argo float, a sensor sensing something, or the building of sandbag walls. In the twin infrastructures, Events would be something like the spinning up or down of a twin, Actions more like triggering some analytical or modelling suite/module.
It's up to the application case to decide what the sensible limits for small vs large scale processes would be, I think, but trying to be as mesoscopic and human-level commonsensical is likely to yield the best results for the generic search and
Related question, is there any point in referencing ephemeral info? (In our case, user service and process runs)
I think that's for the TURTLE group to discuss. Right now, I think we first have to get the twins talking to each other about the big chunks of things they have (software, data, etc) and the major events and actions they (can) perform (i.e. their capabilities). Once we have a few twins talking about such things, we can get more fine-grained. As @rob-metalinkage notes here, that may be the stage to hand over to more specialist semantics and serialisations.
Comments on https://github.com/iodepo/odis-arch/issues/162#issuecomment-1971305512
Metadata Links: Data catalog (STAC), GUI is here.
Fixing link for STAC catalogue: https://catalog.digitaltwinocean.edito.eu/
The catalog is variable-oriented. We use STAC collections for variables, and STAC items for any piece of homogeneous data over a given variable. STAC assets points to either actual data files or API endpoints to retrieve the data.
I see - so this would likely mean you'll make a lot of good use of the variableMeasured
property in the Dataset
type. Perhaps your system may generate Dataset
JSON-LD on the fly based on what variables have been selected, in addition to any a priori dataset instances.
Currently, we use the Resto engine, but we can implement the https://github.com/radiantearth/stac-spec/issues/378 for example.
Seems like good options - the main objective is to make sure the JSON-LD/schema.org that comes out of such tools is correct and in good shape.
Services EDITO has two kind of services: the premanent service that run the platform (a catalog engine, a viewer, a tutorial platform, etc.) and services that the users can launch on their names to build DTO, explore data with their tools, etc. We could expose the former as Service or WebService as you proposed, and the catalog for the latest as Software Application.
The distinction between the Service
and SoftwareApplication
types is really about how these things are offered. Services are pretty generic, so I would imagine that you'd have a SofltwareApplication
file for every component (permanent or user deployed) in the EDITO space. Some of these would be linked to Services (i.e. the software would be noted as an agent
in the [potentialAction](https://schema.org/potentialAction)
stanzas in a Service
record) if the software can be triggered in a service.
Here is the link to all the service a user can launch in its name on the platform: Service catalogs, GUI is here. Please note there are some private catalogs as well (the link above only show the public ones). However, once a user launch service instances for its usage, the API to get the info are authenticated. I am not sure there is a point to have JSON-LD in private endpoints anyway, right?
It's up to you - some of the ODIS partners like to advertise that they have software or services that are available on request, after negotiation, or through payment / agreements. In the spirit of Open Science that the Commission keeps referencing, I would think that it would be wise to have metadata records on any publicly funded activity that doesn't have security or sensitive ethical concerns. This is the move to a "Transparent and Accessible Ocean" - especially for public oversight of publicly funded activities.
Processes EDITO Processes API are not publicly available yet. There is catalog for all available processes and API for running processes. Like the services, we could implement SoftwareApplication for the catalog, but running processes are authenticated.
A running process would be better modelled through an Action
or Event
type - those are processual entities that unfold through time. SoftwarerApplication
records can be used to describe the software used during these processes (e.g. as agents
of Actions
)
However, processes will have two kind of API endpoints: one similar to services above, and one OGC API Processes. In any cases, I am sure we can implement something.
Yes, we'll just need to see examples and place them in the right slot.
Tutorials EDITO host a open and collaborative tutorial platform. Does it fit to the HowTo scheme?
Yes, one can use the HowTo
types well there. If these are multimedia or document based tutorials, one can also use other types like DigitalDocument
, VideoObject
, etc
Workflow We aim at implement a CWL API liking metada and processes, but nothing to show yet.
I have the feeling that workflows will be captured en passant if there's good linking of Actions, agents, and Datasets or other things. We'll explore more with some examples later.
Some outputs from today's meeting:
We discussed the types noted above and challenged them against implementations in EDITO and ILIAD, to figure out where their local components would fit into the generic framework that ODIS would pass on to other twins. The mind map below captures some of the flow:
Some specific points discussed:
isBasedOn
property can be used to indicate that there's a derivation happening and something may not be complete, for example metadata about a cleaned/QCed dataset based on a raw dataset that has errors, omissions, etc temporalCoverage
for subsetting values in, e.g., a data cube. These are quite advanced / finnicky issues, and probably better served by bespoke API calls or PIDs to subsetted data in the distribution
property, to avoid occult metadata. Next step: get some examples from EDITO (@qgau), ILIAD (@marcoamarooliveira e.g. here)
Move examples over to https://github.com/DITTO-OceanDecade/turtle once initial QC is done. Link to those from ODIS.
nesting other descriptions with greater expressivity and domain-relevance within schema.org is easily done
@pbuttigieg can you provide an example on this? We would like to correctly nest the description of the inputs required and the provided outputs for the schema:SoftwareApplication
.
@marcoamarooliveira
Section 3.3.3 of this document describes the approach: https://zenodo.org/records/10219933
In the framework of TURTLE - a project under the UN Ocean Decade DITTO Programme - we'll use this issue to scope out and execute a "hello world" interoperability exercise between at least two (more welcome) digital twins.
We'll attempt to shape / modify some generic ODIS Arch patterns for compute resources, model stacks, etc, and also some specific data exchanges around biodiversity and bathymetry (@pieterprovoost will almost certainly involve OBIS data)
The first objective is to list which twins we'll be testing with and what (meta)data we'll be exchanging ...