hvdsomp / signposting

7 stars 0 forks source link

collection- vs. item-level #21

Closed martinklein0815 closed 4 months ago

martinklein0815 commented 7 months ago

I am struggling a bit with this distinction and I wonder whether it is necessary to make it. The need for unique and accurate anchors is clear but I am not sure the spec needs to make that distinction. PMH and API, for example, are considered collection-level but can be used to obtain data about individual items, just like Signposting, an item-level affordance. Maybe it would be better to merely include language that describes example cases and provides examples of "good" or "accurate" anchors. That feels more flexible and accommodating for further affordances.

martinklein0815 commented 7 months ago

I am also not sure whether the term collection is good in this context. Repo managers, librarians may have varying definitions.

martinklein0815 commented 7 months ago

If we stick with item-level affordances, the text says: "Affordances that are available when interacting with a single item of the repository collection and that are available for each item of the collection."

Is the availability for each item a MUST/SHALL/SHOULD? Do we have such guarantees for collection-level i.e., covers the entire collection?

martinklein0815 commented 7 months ago

Discovery of a catalog that contains item-level affordances only is done at the baseURL level only, not at the item level - I think another argument to not stress that distinction.

hvdsomp commented 7 months ago

General response: As also discussed in private conversations, the distinction between collection-level and item-level affordances is indeed not overly clear. We are in agreement. But, since IMO it would be really helpful to also be able to advertise the oddly-called item-level affordances in an API Catalog, the question becomes how to deal with them if not by means of the collection/item-level distinction. So, hereby an invitation to all to suggest alternative approaches.

Specific responses:

I am at this point not going to make changes to the draft with regard to this issue. Let's hope there will be suggestions for a more convincing approach to handle this. Because, again, I think it would be helpful if these kind of affordances could also be listed in an API Catalog.

huberrob commented 7 months ago

Thanks for starting this interesting discussion, I just wrote an email to @hvdsomp which probably is closely related, so I just copy the relevant parts it here:

sorry for the unclear comment (about what 'dataset' means), with dataset I probably meant the 'collection' sensu signposting: a set of digital objects or 'items' (code, data) plus metadata mostly represented by a landing page. But I always get confused by the term 'collection' and mix it up with things like bibo:Collection

Thanks again for the definition of baseURL for api-catalog, which probably can be regarded as the baseURL of a dc:publisher in 90% of all cases. Sometimes it is a bit more complicated e.g. when more than one technologically independent (I think this is important) bibo:Collections are offered as baseurl.org/archive1 and baseurl.org/database2 or when a bibo:Collection is maintained by more than one dc:publisher.

I am often thinking from the assessment perspective (e.g. 'how do I approve that a OAI is actually offered') which leads to questions like 'where do I look for an api-catalog'...

huberrob commented 7 months ago

Well and regarding the term 'collection' it is indeed confusing as alternative I would say something like 'resource' but this term is already used in a different context for signposting. In BIBO, I think bibo:Document fits best as it includes a lot of other bibo subclasses which unfortunately do not include 'dataset'.. but on the other hand VIVO, which is based on BIBO, includes vivo:Dataset etc as subclass.

hvdsomp commented 7 months ago

So, we basically have two notions, (repository) baseURL and (repository) collection, that essentially refer to the same thing, i.e. a set of "digital objects" that are jointly managed. With a digital object being the thing that is in Signposting represented by a landing page. But both the baseURL and collection terms seem to be unclear. I wonder how, language-wise, we can get out of this conundrum. Because I do think we know what is intended. One could write something explicit along the lines that baseURL is not necessarily https://example.org but could be https://example.org/archive1 and https://example.org/database2. That baseURL is the entry point to a set of objects/items/??? that are jointly managed.

hvdsomp commented 7 months ago

Maybe the term is just repository. A repository manages items. A repository is accessible at a URL named the baseURL. A baseURL could be https://example.org/ , https://example.org/archive1 , https://example.org/database2.

hvdsomp commented 7 months ago

I removed the confusing collection term and only used repository. Kept the repository/item-level distinction (until maybe another solution is found) but reformulated item-level (would object-level be better?). Clarified baseURL. Changes most apparent in https://signposting.org/API-Catalog/#intro and https://signposting.org/API-Catalog/#discovery.

huberrob commented 7 months ago

Sounds good! Then if we have https://example.org/archive1 and https://example.org/database2, an api-catalog link relation type or well-known/api-catalog MUST be available via both baseURLs
To make it crystal clear where to find the api-catalog, it would also be good to have a 'repository' link relation type which can be placed at each landing page ('collection') This would allow to consistently use signposting to find my way back to a repository (and from there to further services) even if my journey started at a landing page which I found via a search engine.

hvdsomp commented 7 months ago

I removed the confusing collection term and only used repository. Kept the repository/item-level distinction (until maybe another solution is found) but reformulated item-level (would object-level be better?). Clarified baseURL. Changes most apparent in https://signposting.org/API-Catalog/#intro and https://signposting.org/API-Catalog/#discovery.

Also reshuffled the info in the Introduction.

hvdsomp commented 7 months ago

Sounds good! Then if we have https://example.org/archive1 and https://example.org/database2, an api-catalog link relation type or well-known/api-catalog MUST be available via both baseURLs To make it crystal clear where to find the api-catalog, it would also be good to have a 'repository' link relation type which can be placed at each landing page ('collection') This would allow to consistently use signposting to find my way back to a repository (and from there to further services) even if my journey started at a landing page which I found via a search engine.

I started a new issue for this.

hvdsomp commented 7 months ago

I removed the confusing collection term and only used repository. Kept the repository/item-level distinction (until maybe another solution is found) but reformulated item-level (would object-level be better?). Clarified baseURL. Changes most apparent in https://signposting.org/API-Catalog/#intro and https://signposting.org/API-Catalog/#discovery.

Also reshuffled the info in the Introduction.

and changed item to object to avoid confusion with Signposting item and PMH item.

hvdsomp commented 7 months ago

I removed the confusing collection term and only used repository. Kept the repository/item-level distinction (until maybe another solution is found) but reformulated item-level (would object-level be better?). Clarified baseURL. Changes most apparent in https://signposting.org/API-Catalog/#intro and https://signposting.org/API-Catalog/#discovery.

Also reshuffled the info in the Introduction.

and changed item to object to avoid confusion with Signposting item and PMH item.

and changed object to scholarly object, with link to the [explanation of that term]() as provided in the FAIR Signposting spec.

huberrob commented 7 months ago

@hvdsomp I fear the PANGAEA example might confuse people a bit. Same with the MEMENTO example. Both point to an individual dataset which is a bit confusing as there are similar examples at the'main' signposting page. I would propose to (at least initially) focus the examples on commonly used services.

{
  "linkset": [
    {
      "anchor": "https://doi.pangaea.de/10.1594/PANGAEA.867908",
      "service-doc": [
        {
          "href": "https://signposting.org/FAIR/",
          "type": "text/html",
          "title": "FAIR Signposting Profile - Signposting the Scholarly Web"
        }
      ]
    }
  ]
}
hvdsomp commented 7 months ago

I would like to invite you to re-read the document. And to re-assess whether the distinction between repository-level and object-level affordances remains unclear. A lot of the wording with that regard has changed. Please give it a try.

huberrob commented 7 months ago

I like everything very much! But I am have the feeling that the repository level is much easier to understand in 'api-catalog' context and therefore I would propose to change the sequence of examples so they begin with repository level.

Further, I am not sure if object-level affordances should be advertised for each individual objects like this anyway. I this would be done at baseURL PANGAEA would need to list all its > 400k datasets there? So maybe object level affordances could be indicated in a more generic way e.g. by a trailing slash or so (https://doi.pangaea.de/ instead of https://doi.pangaea.de/10.1594/PANGAEA.867908) ?

hvdsomp commented 7 months ago

Point taken regarding possible restructuring of the document in terms of repository/object level. Will think about that.

No idea why you think Pangea would need to list 400k objects though. The doc literally and repeatedly says “a sample object” meaning an example of 1 object that illustrates the affordance. Language unclear maybe?

hvdsomp commented 7 months ago

@martinklein0815 @huberrob @phonedude Over the past week, I made significant enhancements to clarify the repository/object-level affordance distinction. I would much appreciate if you could check to see whether I have effectively achieved that goal. Most important changes are in:

@huberrob I also added explicit mention and example of api-catalog links at the landing page.

abollini commented 7 months ago

the current wording seems much clearer to me that the previous version. Collection and item should be really avoided as much as possible as they have a very specific meaning in the DSpace community for instance.

That said, I still have doubt about the meaning of that phrase in the object-level affordance

Only object-level affordances that are consistently available for each object managed by the repository must be considered.

what exactly mean consistently? let's me ask with a concrete example: I have a DSpace repository hosting 100K scholarly objects, some of them consists of digitized manuscript, dataset of annotated HQ images, others are tabular/statistics data, many others are more traditional scholarly outputs. For the items with images (let's say < 1k) we implement the IIIF API family, for the tabular data we offer some custom API to perform queries over the data without the need to download the whole package.

Is IIIF implemented consistently over a such repository? is the tabular data analytics api implemented consistently over the repository? If the APIs are focused on the content in the scholarly object are they "object-level affordances"?

Should we think about an object specific api-catalog that would be exposed only as link from the individual object and that would include the API specifically implemented for this object and a reference to the general repository api-catalog where we will have the repository level API, the object level API available for all the repository's objects?

or, should we enrich the api-catalog so that we can include APIs that are available for some objects but not for others flagging them in some way (optional / maybe)? we could eventually use the web service status https://www.rfc-editor.org/rfc/rfc8631#section-5

hvdsomp commented 7 months ago

The question re what to do if only a subset of resources support an interoperability affordance has come up before. Good idea re status as its semantics are rather loosely defined. I will think about it. Whichever way, since it’s a link type there will need to be a target URL and hence some document. Maybe that doc could say “IIIF only for such and such resources”

huberrob commented 7 months ago

@hvdsomp I have seen you changed 'sample object' to 'example object' but I still find the use of an example object URI to indicate object level affordances a bit confusing.

If, for example, object level affordances would generally be enabled by a repository for all objects located at www.myrepository.org/archive1/set1/ but the example object in signposting would be given as www.myrepository.org/archive1/set1/subset2/123 would this now mean that such object level affordances are given at level www.myrepository.org/archive1/set1/subset2/ only and not at www.myrepository.org/archive1/set1/subset1/ ? Maybe this can be better indicated by using e.g. Xpath?

In most cases object level affordances would be enabled by a specific service at repository level or set level anyway? At least the signposting and ro-crate examples would work like that?

hvdsomp commented 7 months ago

The approach you propose assumes that there exists a set, identified by a URI, that contains all objects that exhibit a certain object-level affordance. We cannot assume that. The IIIF affordance mentioned by @abollini serves as a good example. Also, I really don’t understand the problem with the “example” approach. The goal of the API Catalog is to list which interoperability affordances are implemented by a repo. In case of repo-level, one points at the URL of the affordance as a way to support its discovery. Because one can’t uniformly discover it otherwise. The object-level affordances are different. They don’t need to be discovered. The goal of listing them is just to inform they are implemented. One discovers objects that exhibit the affordance by happening upon them. But, having seen a list of affordances in the API Catalog, a bot can be ready to use such affordance when an object exhibits it.

huberrob commented 7 months ago

Yes the IIIF example is really good. Maybe I am just confused because of the RO-CRATE example, which I think is not so good since RO-CRATE is a format specification.

Regarding the example link, when you say:

The object-level affordances are different. They don’t need to be discovered.

Do we then need the anchor there anyway?

hvdsomp commented 7 months ago

Yes the IIIF example is really good. Maybe I am just confused because of the RO-CRATE example, which I think is not so good since RO-CRATE is a format specification.

Indeed. And IMO object-level affordances include how objects and metadata about objects can be represented. So, if a data repository supports DCAT descriptions of is Dataset, that could be listed in an API Catalog.

Regarding the example link, when you say:

The object-level affordances are different. They don’t need to be discovered.

Do we then need the anchor there anyway?

Without an anchor there is no link. And since the API Catalog is a link set, we need a link. Also, the example URL allows a developer to check out the landing page where the affordance is available and can observe its actual implementation. E.g. how exactly is Signposting implemented in this repository?

huberrob commented 7 months ago

Indeed. And IMO object-level affordances include how objects and metadata about objects can be represented. So, if a data repository supports DCAT descriptions of is Dataset, that could be listed in an API Catalog.

Here I disagree, this interpretation of API imo is too broad and will probably lead to confusion (I am already confused ;) ). Further, listing metadata standard(s) alone like:

{
      "anchor": "https://my.repo.org/item/008375/",
      "service-doc": [
        {
          "href": "https://w3id.org/ro/crate/1.1",
          "type": "text/html",
          "title": "RO-Crate 1.1 | Research Object Crate (RO-Crate)"
        }
      ]
    }

would not be enough to actually retrieve metadata in this format. wouldn't we then also have to indicate a mechanism like signposting, content negotiation or OAI? I think to indicate e.g. DCAT support at object level the standard signposting approach using describedby is already a great way to do this and should be favoured.

Without an anchor there is no link. And since the API Catalog is a link set, we need a link. Also, the example URL allows a developer to check out the landing page where the affordance is available and can observe its actual implementation. E.g. how exactly is Signposting implemented in this repository?

maybe I understood this wrong but in RFC 9264 you stated:

Each link context object MAY contain an "anchor" member with a value that represents the link context.

The different usage/scope of anchor is problematic as long as we cannot indicate the affordance type (object, repository). Could we for example indicate this using status ?

hvdsomp commented 7 months ago

Indeed. And IMO object-level affordances include how objects and metadata about objects can be represented. So, if a data repository supports DCAT descriptions of is Dataset, that could be listed in an API Catalog.

Here I disagree, this interpretation of API imo is too broad and will probably lead to confusion (I am already confused ;) ). Further, listing metadata standard(s) alone like:

{
      "anchor": "https://my.repo.org/item/008375/",
      "service-doc": [
        {
          "href": "https://w3id.org/ro/crate/1.1",
          "type": "text/html",
          "title": "RO-Crate 1.1 | Research Object Crate (RO-Crate)"
        }
      ]
    }

would not be enough to actually retrieve metadata in this format. wouldn't we then also have to indicate a mechanism like signposting, content negotiation or OAI? I think to indicate e.g. DCAT support at object level the standard signposting approach using describedby is already a great way to do this and should be favoured.

Look, what I am (we are?) trying to do is to leverage (the word choice is intentional) a specification (api-catalog I-D) to achieve something that currently cannot be achieved with scholcomm repos: Obtaining an overview of all interoperability components a repository implements, in a single go. I think that is a really worthwhile effort. If we stick to the core of what the api-catalog I-D is about - supporting discovery of actual APIs - then many repositories would not need to publish an API Catalog because many don't have actual APIs. By which I mean that, by including support for standards such as SPARQL, OAI-PMH, etc., our API Catalog spec already "leverages" the api-catalog I-D. Which is really fine because, like many IETF specs, the I-D is written to allow for broad interpretation and flexible implementation. And, honestly, if our API Catalog would only be about listing actual APIs it wouldn't be worth our time and we wouldn't need to write a spec because then the api-catalog I-D can be used "as is". So, following the logic of "leveraging the I-D", I thought it would be really nice to also be able to include support for things like Signposting. Which, it seems to me you are OK with, but maybe I'm misinterpreting and maybe you oppose all object-level affordances. Because, I personally don't see why Signposting would be OK and RO-Crate not. Note, again, that the entries for object-level affordances are not about discovering them because, as you say, many times they are discovered when interacting with an object. Rather these entries are about announcing them so a developer can be prepared. For example, a repository could provide a generic "describedby" link pointing at metadata but without indicating a mime type or profile on the link. Meaning content negotiation may need to take place to obtain a specific format. But for which format to negotiate? Well, the API Catalog could list the formats and the developer would know what to negotiate for. I am obviously all for FAIR Signposting to address this, but, hey, publishing a static API Catalog is even simpler than implementing the very simple FAIR Signposting ;-)

I would like to hear from others too regarding the inclusion or not of object-level affordances. It will be clear that I am really in favor (because I love the idea of a single document that lists all interop affordances) and will actually be disappointed if we decide to remove them from the spec. But if others are also confused ...

Without an anchor there is no link. And since the API Catalog is a link set, we need a link. Also, the example URL allows a developer to check out the landing page where the affordance is available and can observe its actual implementation. E.g. how exactly is Signposting implemented in this repository?

maybe I understood this wrong but in RFC 9264 you stated:

Each link context object MAY contain an "anchor" member with a value that represents the link context.

When no "anchor" is provided the anchor is the URL of the resource that provided the link set. As is the case in HTML documents. See Section 4 of the Link Set RFC. So, basically, that would mean that the URL of the API Catalog is the anchor. Which would be meaningless because the link would be e.g. URL-of-API-Catalog service-doc URL-of-Signposting-spec.

The different usage/scope of anchor is problematic as long as we cannot indicate the affordance type (object, repository). Could we for example indicate this using status ?

It seems to me that the URLs of object landing pages and of machine APIs would look significantly different for a human to be able to spot the difference. And the machine-uses of the API Catalog that I can envision would go something like this:

huberrob commented 7 months ago

Please, don't get me wrong I really like object level affordances, but I think we should try to improve the specs for these metadata format examples a bit.

Let's take your example workflow:

But... if for example type and profile attribute would have been given:

So a possible solution would be to recommend to use type and profile attributes in case metadata formats are announces as object level affordances.

hvdsomp commented 7 months ago

That would have to also mean that the anchor, for e.g. DCAT, would not be the landing page URL but the URL of an actual metadata record. Because the type and profile attributes that you would like to provide are link target attributes and would be about the metadata record not the landing page. I could definitely live with that.

hvdsomp commented 7 months ago

@huberrob Since I now understand that your objection is not about object-level encoding but rather about how to convey them, I am actually going to have a try in the draft spec to see what that looks like when using the "resource that exhibits the affordance" instead of "always the landing page" as the example URL. For Signposting, it will be a landing page URL but for e.g. RO-Crate it would not be.

hvdsomp commented 7 months ago

That would have to also mean that the anchor, for e.g. DCAT, would not be the landing page URL but the URL of an actual metadata record. Because the type and profile attributes that you would like to provide are link target attributes and would be about the metadata record not the landing page. I could definitely live with that.

That was wrong because one can't provide attributes (e.g. type and profile) for an anchor. One can only provide attributes for a link target. But I still like the idea of using the URL of an actual DCAT, RO-Crate. It's just that oen couldn't provide type and profile info.

hvdsomp commented 7 months ago

Please, don't get me wrong I really like object level affordances, but I think we should try to improve the specs for these metadata format examples a bit.

Let's take your example workflow:

* Bot wonders if DCAT formats is supported

* Bot retrieves the API Catalog

* Bot looks for a link with service-doc link relation and https://www.w3.org/TR/vocab-dcat-3/ as target

* Bot takes the URL that is the value of the anchor member of that link and has found an example.

* Bot has no idea how to retrieve and verify DCAT for this example and all objects the example represents

This supports using the URL of a DCAT resource as the example. Which I am now in favor of.

But... if for example type and profile attribute would have been given:

See below. That cannot be done for an anchor of a link.

* Bot knows how to retrieve DCAT using content negotiation

* Bot knows what signposting links it has to look for to get DCAT

The object-level affordances are about advertising/exemplifying support for a certain affordance. The bot should not require extra knowledge to see/retrieve an example. Hence, I really think that the URL of a DCAT resource as anchor would work well.

hvdsomp commented 7 months ago

I updated the draft spec to use the URL of an example resource that exhibits the object-level affordance, e.g. landing page for Signposting, URL of an RO-Crate for RO-Crate. Updated examples too.

abollini commented 7 months ago

I just found out that my issue with the IIIF sample is related to this discussion and should be updated accordingly, see https://github.com/hvdsomp/signposting/issues/25#issuecomment-1834073482

I like to have this anchor https://gallica.bnf.fr/iiif/ark:/12148/bpt6k10733944/manifest.json for this repository object https://gallica.bnf.fr/ark:/12148/bpt6k10733944/f131.planchecontact

hvdsomp commented 7 months ago

@abollini I updated the example. thanks!

martinklein0815 commented 7 months ago

I just re-read the updated document and the new wording re repo-level and object-level (with reference to the Signposting document) makes a lot of sense to me. It is much clearer and avoids various dialects of subject matter expert language.