How to represent "linkouts" as opposed to "cross-references" in OBO ontologies?

matentzn commented 6 months ago

I am thinking hard about the following problem. I would like a way to simply add URLs with additional information to a class. We all know about mappings - linking a class or term from one ontology to another; we know cross-references, which includes mappings from ontology terms to database identifiers or schema elements. But in some cases, we want to just link to a website that provides “additional information” about a class. For example, all these pages provide additional information about “X-linked syndromic intellectual disability” (MONDO:0020119).

Now you could go all in on ugly and do this:

xref: clingen.condition:MONDO:0020119, or, god forbid, xref: clingen.condition.mondo:0020119
xref: otar.disease: MONDO_0020119
xref: wikipedia:X-linked_intellectual_disability

I am not saying this is impossible, but isnt this going a bit far? I don't really consider these examples cross-references at all, as you can see by the issue I lined above, and this rough categorisation (my opinion only):

Alternatively, we could go ahead and say, well my friend, use rdfs:seeAlso that is what it has been designed for ("rdfs:seeAlso is an instance of rdf:Property that is used to indicate a resource that might provide additional information about the subject resource.").

Here is my problem with this:

Obviously, what I would really want is to be able to uniformly display those URLs across all our tools - If you show a page with information about a disease, you should be able to show all the disease relevant links on that page. In my experience, we have used "see also" for basically everything: thousands of links to GitHub issues, papers, and documentation pages. This information should probably not be displayed alongside information "about the concept" and presented to the user. So my spidey senses tell me a new property would be better, like "additional information", "linkout" or some such. I am very much not sure, but I very much need to solve this problem at least in my projects until the end of the month.

If anyone has an opinion or ideas, let me know!

Clare72 commented 6 months ago

We have some stuff like this in FBbt - currently all using xref, which I agree could maybe be improved.

I guess it would be important to define any new property well enough that it is clear when to use it (rather than xref). For me I would consider that the wikipedia page could be a source for the term definition - so should maybe be an xref annotation on the definition? The other database linkouts are different to this, providing some data/context, but not really sources for the concept definition (users of the term rather than sources for the term...) - my 'spidey senses' tell me that this might be an important distinction

matentzn commented 6 months ago

@Clare72 yep, important point!

Websites are often sources of definitions
Sources of definitions are historically represented as xrefs on definitions

This indeed makes the decision a tiny bit more complex.

mcourtot commented 6 months ago

I wonder if the issue is as you describe above not the property per se - as you say, seeAlso was made for that- but rather a project specific desire to hide or show some properties to some users? Eg if using seeAlso for everything, you'd want to show GH issues to ontology developers, and general info websites to ontology users. Does this then become a UI rendering layer need on top of the resource rather than different properties? I'm thinking about the work eagle-i did in the past for example @carlotorniai and @ontowonka worked on

matentzn commented 6 months ago

@mcourtot Thank you for your position. In the end, you are right. If I would create a nicely defined property for "link to additional information about this resource", I have not solved the problem yet that I want to show some information to ontology editors, some to clinicians and some to researchers - I essentially squeeze the "rdfs:seeAlso" problem into a narrower space. I was hoping we could find a way to find a property that could work for 85% of the cases where a linked resource provides "additional information about the domain concept".

@cmungall has repeatedly encouraged me to see if we can define the problem as a UI problem, but it creates a big burden on API and UI developers to actively employ a second standard for linkouts. You could imagine a linkout config like this (maintained by the ontology curation team, say Mondo):

resources:
  - id: clingen
    linkout_url: https://search.clinicalgenome.org/kb/conditions/$1
    id_processing:
      type: curie
    idspace:
      - MONDO:123
      - MONDO:124
      - MONDO:987
  - id: otar
    linkout_url: https://platform.opentargets.org/disease/$1
    id_processing:
      type: curie
      replacements:
        ":": "_"
    idspace:
      - MONDO:123
      - MONDO:124
      - MONDO:987

This includes both the list of IDs covered by the external resource, the endpoint and the ID processing rules required to say, change MONDO:123 to MONDO_123.

But lets say we generate a config like this, we still

Need to teach resource owners how to use it. Which could be quite a burden, also for UI devs who now need to access a second file
Need to convince general purpose browsers like OLS or ontobee to implement these - which is unlikely a priority as from their perspective, this will only add to complications. If there would be properties on the ontology classes, the links would just show up on the general purpose term browsers..

mcourtot commented 6 months ago

Completely agree with all your points; just trying to start from the use case you described (different rendering to different user types) and see what the options are.

I can only think of

rdfs:seeAlso subproperties for each user type. This is easy to implement, well supported already, but has the potential to explode. Also unclear how each user's type is defined (and how one user can have several types?) for each project - you'd need some sort of config as well?
annotations on rdfs:seeAlso for each user type. This is nice but I always found cumbersome to use. Maybe a little bit cleaner and more flexible?
UI layer. This is the most flexible but as you describe not necessarily trivial for all projects.

Maybe try with a specific example of 1 and see if that works well enough as starting point? If it doesn't fit then we can evaluate other options?

Clare72 commented 6 months ago

I think I would vote for subproperties based on the content of the resource, rather than trying to guess what each type of user may want (and having the same users across all ontologies!) example properties could be used for things like:

resources about edits to the term, such as github tickets/PRs
resources with data annotated with this (or an equivalent) term
resources for additional information that was not used to create the definition, but could provide some additional detail/clarification about the term meaning
resources that provide documentation for how the term should be used for curation

allenbaron commented 5 months ago

By linkout, it looks like you mean include links to websites where ontology users (i.e. database curators) have linked information to a specific ontology term (looking at the CLINGEN example). Is that right? If so, I personally feel like this is beyond the scope of ontologies.

I understand the desire to promulgate these links to as many people as possible in as simple a way as possible and I completely agree that it is super important to connect this data. In my opinion, what we really need is a simple, standardized query language that supports querying ontologies and databases... something like federated SPARQL queries that goes beyond querying RDF and also supports querying APIs/databases (and ideally also handles OWL logical axioms better).

matentzn commented 5 months ago

@allenbaron I think you are right in that standardising data access is the "right" thing to do - but realistically, we are decades away from a world where we can ask "do you dear database have information about this term?". Furthermore we still would have to encode somewhere the information which resources to even talk to.

@cmungall argued also along the lines of "this should be solved at API/UI level".

My argument is:

It is impractical to expect in the next 5-10 years that resources that offer up additional information about a term to have a standardised Interface to "ask it".
It is a great burden for generalised tool developers to figure out (1) which resources have relevant information (linked open science data graph) and (2) if they new the resource, figure out under which circumstances and for which term information is available (does the resource have gene links for my disease)?

The ontologies themselves have better chances to curate this information "where else can you find Information about this disease".

If there was a real semantic web, yes, this issue would not exist.. what do you think? Are you amenable to the "practicality" angle as an argument?

@Clare72 your suggestion is probably very sane. I will need to think it through, fearing, as always, term proliferation and making things complicated for Ontology implementors.

allenbaron commented 5 months ago

I'm a dreamer, I guess 😅, and a realist. I am absolutely amenable to a practical approach that can be implemented.

I think @Clare72's suggestion is probably most practical at the current time and agree that if this is done at the level of the ontology, it would require less people to do the work of identifying and creating these links, since they'd propagate to anyone who uses the ontology. We already have term tracker item for "resources about edits", we could probably add a few more covering the "resources for additional information that was not used to create the definition" and "resources that provide documentation for how the term should be used for curation." I'd even argue this last one definitely belongs in the ontology.

But, when it comes to data linked to an ontology term, there are a number of questions that, in my mind, need to be addressed before something reasonable can be implemented:

How would the selection of links to include be accomplished?
- Some ontologies have been linked to data in hundreds of additional resources. Who would decide which are included and how would that "worthiness" be fairly determined? Are ontology maintainers qualified to make these determinations?
Would this really make less work for ontology implementers?
- Ontologies can be used in a wide variety of applications. No matter which links to "resources with data annotated with this (or an equivalent) term" are included, there will likely be some that are relevant to specific fields of research, like the area of agriculture versus human health.
- How big of a subproperty hierarchy would need to be created to make it truly easy for ontology implementers to find and use the ones they want?
- Would those not of interest need to be stripped out? This could certainly be automated, if done right, but it would likely still require at least some (perhaps less) work at the ontology implementer level .
How would the identification, creation, and maintenance of these links be accomplished by ontology curators/maintainers?
- To create links at the term level across even a handful of resources linking data to a term would require the ontology team to access unique APIs or crawl webpages. I've tried doing this some and it's challenging. Change is constant and standards are not sufficient, particularly for small ontology teams.

I know that practicality often gets squashed by hypotheticals in discussions. I don't want to squash a good practical approach, especially one I can implement 😉. I think about the problem of linking data literally every day. What about using rdfs:seealso subproperties to publish a list of online resources that might have additional information about terms in an ontology, along with some sort of description of the resource (maybe including an API endpoint?), instead of creating individual links at the term level? That would at least make ontology users aware of resources out there, which is sorely needed.

cmungall commented 5 months ago

It is impractical to expect in the next 5-10 years that resources that offer up additional information about a term to have a standardised Interface to "ask it".

NCBI has been doing this forever https://www.ncbi.nlm.nih.gov/projects/linkout/

Many other resources also adopt a similar mechanism

matentzn commented 5 months ago

@allenbaron very reasonable arguments, I will get back to you below, but lets first look at @cmungall's suggestion. Here is the key documentation:

https://www.ncbi.nlm.nih.gov/books/NBK3802/

This is essentially what I am suggesting; only that

The ontology takes the role of NCBI to organise the "linking out" (ok, you could do this on obofoundry level as well I guess to safe resources, but lets just keep the eye on the technical issue before thinking about governance)
the two config files that are submitted are submitted as ROBOT templates by the resources

The linkout system from ncbi is much more powerful than what I am asking here (including the ability to provide boolean queries for search and I think API endpoints). It also still requires manual work; every element on the side of the internal resource (say the ontology) still needs to be manually associated with every element in the target, external resource. No free lunch in terms of: all genetic diseases in Mondo can be accessed at that location (its all individual linkouts).

TLDR: If anything, NCBIs linkout confirms that we need some system to orchestrate the linkouts.

Back to @allenbaron

Some ontologies have been linked to data in hundreds of additional resources. Who would decide which are included and how would that "worthiness" be fairly determined? Are ontology maintainers qualified to make these determinations?

Why should there be a rule? Link the ones you (the ontology owner) wants to link for one reason or the other. The question is not much different from the question of which resources to xref to, right? And no one has regulated that. I do expect some issues of course for the ontologies themselves that wish to build a linkout list like this and need to decide what to include. On ontology level, your concern may be warranted, but I daresay that while maybe 1 order of magnitude larger in scope, its essentially the same as the xref problem. Indeed, you could always phrase it as such, as you can supply arbitrary linkouts as xrefs in your ontology.

Ontologies can be used in a wide variety of applications. No matter which links to "resources with data annotated with this (or an equivalent) term" are included, there will likely be some that are relevant to specific fields of research, like the area of agriculture versus human health. How big of a subproperty hierarchy would need to be created to make it truly easy for ontology implementers to find and use the ones they want? Would those not of interest need to be stripped out? This could certainly be automated, if done right, but it would likely still require at least some (perhaps less) work at the ontology implementer level.

I totally agree with these questions. They are a big concern for me as well, as practically speaking, most UIs will want to have super fine grained control over what to display on which part of the side (Clinical reports; genomics data; etc). Right now I dont have an answer. The question is: must we solve this problem conclusively before moving forward here? We could decide on a small set of properties like the ones @Clare72 suggested and then see where this takes us. My slightly fishy provisional answer would be to add source annotations as axiom annotations to the linkout assertion and the the UI providers filter based on those, but, of course, nothing is really dealing with all issues you are asking about..

To create links at the term level across even a handful of resources linking data to a term would require the ontology team to access unique APIs or crawl webpages. I've tried doing this some and it's challenging. Change is constant and standards are not sufficient, particularly for small ontology teams.

I think we dont need to answer this, indeed challenging, question to move this discussion forward. My current ideal would be to have the external resources provide the linkouts themselves. So this is the SOP:

The ontology is interesting enough to serve as a vehicle to proliferate access to a target resource
The target resource is interesting enough for the ontology to integrate
The maintainers of the target resource, if the so want that additional exposure, provide a table with a ontology ID -> linkout mapping to the ontology

The automated process (API crawling etc) is also an option, but for me, the most important case right now is the case where the external resource wants to be linked out to.

matentzn commented 5 months ago

@allenbaron if I see this correctly, you deal with this issue in DO by doing this:

<oboInOwl:hasDbXref>url:https://www.omim.org/entry/203100?search=albinism%20type%20ia&amp;highlight=albinism%20ia%20type</oboInOwl:hasDbXref>

Its exactly these kinds of cases I would like to solve here - without getting into the debate of if we should, but if we do, how do we best do it.

allenbaron commented 5 months ago

In some sense, you're right. We do add URL links as xref annotations on definitions to publications that provide more information about a disease. The great majority of these "publications" were actually used to create the definition, probably including this one (rarely we can't find publications to cite that we feel sufficiently support the definition and use OMIM or NCI as the definition source). I don't know why they're formatted starting with "url:".

I'm drawing a blank on how best to solve this problem. For the links that are focused on providing more info for a human to review:

resources about edits to the term, such as github tickets/PRs

resources for additional information that was not used to create the definition, but could provide some additional detail/clarification about the term meaning

resources that provide documentation for how the term should be used for curation

It could be as simple as creating a more specific properties for each (like term tracker item was for the first one).

One the other hand, this one:

resources with data annotated with this (or an equivalent) term

Seems like the kind of thing that a user would want a program to be able to access and aggregate data from for use in further analysis. That wouldn't be solved with a simple link to a website.

jonquet commented 5 months ago

Our paper https://hal-lirmm.ccsd.cnrs.fr/lirmm-02945170 was measuring the differences between the different types of XRefs

And the recommendations were :

So indeed rdfs: seeAlso was recommended (which I agree with you @matentzn that it is used by many other things.... but in a sense this is the Semantic Web, you do not control how a property will be used). Or I see we recommended dcat:landingPage. First it suprised me , but I forgot that the domain was relaxed in DCAT2 to be applicable to other things and the definition says : "and/or additional information"

If you don't like landingPage, then I would go up the hierarchy and take foaf:page.

matentzn commented 1 month ago

Thank you everyone for the discussion and contributions. None of the discussed solutions are perfect. My personal preference after re-reading everyones positions would be the subproperty of rdfs:seeAlso suggestion here. Unless there is any strong objection, I will move to implement it.

information-artifact-ontology / ontology-metadata

How to represent "linkouts" as opposed to "cross-references" in OBO ontologies? #165