Implement id changes for GEOME

dannymandel commented 3 years ago

In looking over the examples for GEOME, I saw the following two difference where we need to pick the right thing to use:

For "sampleidentifier":

"sampleidentifier": "ark:/21547/Car2PIRE_0334",
"sampleidentifier": "http://n2t.net/ark:/21547/R2INDO119289"
"sampleidentifier": "LACM:DISCO:16924"

Do we want the n2t prefix, or the third variant which appears to have no scheme prefix?

dannymandel commented 3 years ago

For "@id", it looks like we have the following variants:

"@id": "metadata/21547/Car2PIRE_0334"
"@id": "metadata:21547-eg2AB4OQ34"
"@id": "metadata:21547/CgZ2PEER_7055"
"@id": "ark:/21547/Cgx2MGH18_1_E4"

Which one should we use?

dannymandel commented 3 years ago

Adding @datadavev to the ticket since he expressed interest in the morning standup.

datadavev commented 3 years ago

The @id value needs to conform to the JSON-LD spec ^1, which identifies it as a relative or absolute IRI.

This relates to #27

This also suggests that some required functionality of iSamples will be to resolve these @id values to the requested representation of the referenced resource.

dannymandel commented 3 years ago

One other thing it would be good to standardize on -- on some of the fields where we synthesize a bunch of fields from GEOME into one iSamples field, it would be good to be consistent on how we separate them. Right now, it looks like we choose between ; and |.

e.g.

'description': 'samplingProtocol: ARMS | expeditionCode: '
                               'INDO_PIRE | taxonomy team: MINV | taxonomy '
                               'team: 80'

vs.

 'description': 'sampling protocol:Dead Coral Head; '
                               'projectId:78 ; expeditionCode: PEER_2016',

Personally, I feel like | is easier to read.

datadavev commented 3 years ago

Sure, let's got with | (pipe char with white space either side).

smrgeoinfo commented 3 years ago

We discussed the philosophy about the @id property on the metadata record in one of the past tech meetings, and I think we agreed that the identifier for the metadata record about a sample should be different than the identifier for the physical sample itself. This follows the pattern that the TDWG MIDS group is following. It would suggest something like one of

"@id": "metadata/21547/Car2PIRE_0334" "@id": "metadata:21547-eg2AB4OQ34" "@id": "metadata:21547/CgZ2PEER_7055"

I don't think we're planning on registering arks for the metadata records, the idea would be if you dereference the identifier for the sample, what you get is the metadata record 'about' the sample. Based on some other comments I've seen, using 'metadata:' as a prefix is not a great idea-- it suggests that 'metadata' is a URI scheme (following RFC-3986 syntax), so something like metadata/21547/Car2PIRE_0334 seems like the best option.

As far as the sample identifier, my take would be that 'ark:/21547/Car2PIRE_0334' is the best option. 'http://n2t.net/ark:/21547/R2INDO119289' is a concatenation of 'http://n2t.net/' (a URL path to a resolver service) and the 'ark:...' part which is the actual identifier. That is the purist view, and assumes people will know how to resolve 'ark:' URIs....

smrgeoinfo commented 3 years ago

pipes are fine too

datadavev commented 3 years ago

The value of @id must be a relative or absolute IRI that can be used to retrieve the graph to which it is assigned. Only the entry "@id": "metadata/21547/Car2PIRE_0334" of those three examples is a valid relative IRI.

A JSON-LD processor will prepend the base of the document to create an absolute IRI from that value.

There is a pattern of interaction between the document identifiers and the resource provider (i.e. web server) that must be considered. Examples:

If the record is retrieved from the address:

https://isamples.org/metadata/21547/Car2PIRE_0334

And it has a relative @id value of metadata/21547/Car2PIRE_0334 then the computed absolute IRI will be:

https://isamples.org/metadata/21547/metadata/21547/Car2PIRE_0334

Retrieved from:

https://isamples.org/metadata/21547/Car2PIRE_0334/

The computed absolute IRI will be:

https://isamples.org/metadata/21547/Car2PIRE_0334/metadata/21547/Car2PIRE_0334

With a "@id":"." and retrieved from:

https://isamples.org/metadata/21547/Car2PIRE_0334

The computed absolute IRI will be:

https://isamples.org/metadata/21547/Car2PIRE_0334

Hence, depending on the way we want to access this information, the value of @id may well change.

smrgeoinfo commented 3 years ago

what if we use an IRI like 'isam:metadata/21547/Car2PIRE_0334' and map isam to whatever the resolver host is that we decide on using in the production system?

datadavev commented 3 years ago

Oh, right, OK. Like:

{
  "@context":{
    "isam":"https://isamples.org/service/",
    "is": "https://isamples.org/vocab/",
    "name":{
      "@id": "is:name"
    }
  },
  "@id":"isam:metadata/21547/Car2PIRE_0334",
  "name":"Some test record"
}

Which would expand to:

[
  {
    "@id": "https://isamples.org/service/metadata/21547/Car2PIRE_0334",
    "https://isamples.org/vocab/name": [
      {
        "@value": "Some test record"
      }
    ]
  }
]

And lets us update the resolver location by adjusting the context. Nice.

smrgeoinfo commented 3 years ago

Yea, that's what I was thinking. Will it work?

datadavev commented 3 years ago

Indeed it does: https://tinyurl.com/yzuh6qn4

datadavev commented 3 years ago

And here it is with a remote context: https://tinyurl.com/b89twkvb

and a different version of the remote context with the target for isam adjusted: https://tinyurl.com/bzzf62n6

The context docs are in a gist at: https://gist.github.com/datadavev/8c93a9551ac38473e53c8bc1c04b7c60

I like this as a solution since it provides a nice mechanism for adjusting the resolver location without having to touch the records, just update the context.

smrgeoinfo commented 3 years ago

can we go with this solution and close this issue?

dannymandel commented 3 years ago

Thanks for the explanation, gentlemen. I'll move this one over to me for any implementation that's required.

isamplesorg / metadata

Implement id changes for GEOME #38