plazi / synolib

http://oss.factsmission.com/synogroup/
MIT License
1 stars 2 forks source link

Integrate CoL-Data #12

Open nleanba opened 2 months ago

nleanba commented 2 months ago
nleanba commented 2 months ago

It might be worthwile to let synolib also accept URIs for CoL-taxons and (our) taxon-concepts/names as search query -- this would allow taxomplete to just have an IRI as a value and would allow for easy queries for non-binomial names or names with weird characters

nleanba commented 2 months ago

I have (in PR) implemented searching by URI.

I think it might be a good idea to only allow searches by URI, and make Taxomplete feed Synolib an URI directly.

We might make a more generic (tn/tc/col uri or latin name) → list of tn/tc/col uris function (finding trivial synonyms where the name is the same) to be used by Taxomplete to give a starting point URI and by synolib to find trivial synonyms.

This would unify all places where we search for names by string literals and would make it easier to integrate CoL data.

@retog opinions?

edit: Taxomplete would still need its own code to find URIs, as it is the only place where we want to find partial matches (it's auto complete after all)

nleanba commented 2 months ago

UI wise I think Taxomplete should show small badges in it's auto complete indicating if the suggestion is a TN/TC or CoL taxon (or both)

nleanba commented 2 months ago

In general, there might be some benefit in reconsidering the data structures of synolib.

This would be a slight (but clean) divorce between the treatment rdf structure and the synolib structure, which is necessary to integrate the col data.

Related to *: It might make sense to make this list be "lazy evaluated" where Synolib only fully loads all TC of a T and further synonyms when requested/awaited by the library user (Synospecies). Internally, Synolib would still collect already found synonyms and not make duplicate queries, but only start expanding them to find more on request.

nleanba commented 2 months ago

We have to be careful however when using CoL taxa to find synonyms: Different tcs (with differing latin names) can be linked to the same col taxon -- therefore, we always must check if two potentially trivial synonyms are actually the same latin name (same T)

example:

SELECT DISTINCT * WHERE {
  ?s ?p <https://www.catalogueoflife.org/data/taxon/8295f6bf-59a3-431a-9e7b-eff343efa154> .
}

(for my own reference:

PREFIX cito: <http://purl.org/spar/cito/>
PREFIX dc: <http://purl.org/dc/elements/1.1/>
PREFIX dwc: <http://rs.tdwg.org/dwc/terms/>
PREFIX treat: <http://plazi.org/vocab/treatment#>
SELECT DISTINCT
  ?tc (group_concat(DISTINCT ?auth; separator=" / ") as ?authority) (group_concat(DISTINCT ?colid; separator="|") as ?colids) (group_concat(DISTINCT ?aug;separator="|") as ?augs) (group_concat(DISTINCT ?def;separator="|") as ?defs) (group_concat(DISTINCT ?dpr;separator="|") as ?dprs) (group_concat(DISTINCT ?cite;separator="|") as ?cites)
WHERE {
  {
    ?tc treat:hasTaxonName <http://taxon-name.plazi.org/id/Animalia/Sadayoshia_miyakei> .
  } UNION {
    ?tc <http://www.w3.org/2000/01/rdf-schema#seeAlso> <https://www.catalogueoflife.org/data/taxon/8295f6bf-59a3-431a-9e7b-eff343efa154> .
  }
  OPTIONAL { ?tc dwc:scientificNameAuthorship ?auth . }
  OPTIONAL { ?tc <http://www.w3.org/2000/01/rdf-schema#seeAlso> ?colid . }
  OPTIONAL { ?aug treat:augmentsTaxonConcept ?tc . }
  OPTIONAL { ?def treat:definesTaxonConcept ?tc . }
  OPTIONAL { ?dpr treat:deprecates ?tc . }
  OPTIONAL { ?cite cito:cites ?tc . }
}
GROUP BY ?tc

)

retog commented 2 months ago

The sarch by URI change sounds reasonable.

As for the change in synolib, should it only support synonyms or no-synonyms are also to restrict the search, so some synonyms but mabye not all?

nleanba commented 2 months ago

I think having the option to restrict the search (all, some, no synonyms) is only a future possibility and we don't really need to decide on details now.

If we do implement it, I would (start) with two modes only: all synonyms (same as now, default) and no-non-trivial synonyms (i.e. only things that have the same latin name as was searched for.

Middle-ground modes to me only make sense if a) there is explicit demand for it or b) we can implement them in a way that is either invisible to the user or provides a somehow more ergonomic experience (e.g. load the next synonyms only on scroll or idk). I mostly included it in my list above because it is a possibility, not beacuse I see an urgent need.