[names] Match OpenUp! functionality

re1 commented 4 years ago

[x] Routes → Expose /commonNames route (or both?).
[x] Parameters → Allow JSON object as query parameter.
[x] Get common names by id → Get common names by their id from cache
[ ] Get scientific names by id → Get scientific names by their id from cache
[ ] Sources → Include all sources from OpenUp! in JACQ.
- [ ] Artsdatabanken (related to Artdatabanken SOA / Dyntaxa?)
- [ ] Austrian Academy of Sciences (formerly WBÖ and currently blocked due to deprecation)
- [x] Catalogue of life
- [ ] Dyntaxa (Artdatabanken SOA)
- [ ] Luomus (currently blocked due to likely deprecation)
- [x] Meertens KNAW PLAND (currently blocked due to an API change)
- [x] NHM Wien
- [x] PESI
- [x] YList
- Static sources without endpoint:
- [x] Allearter DK
- [x] Azerbaijan
- [x] Bratislava
- [x] Czech Jiri
- [x] Czech Prague
- [x] ETI Databases
- [x] Hebrew Linda
- [x] Hungarian Peregovits
- [x] Linnaeus projects
- [x] New Zealand Landcare Research
- [x] Russian Plantarium
- [x] Slovak Bratislava
- [x] TogoDB Japanese
- [x] Ukrainian Kobiv (blocked due to #16)
[x] Service Metadata as described here
- [x] defaultTypes

re1 commented 4 years ago

/commonNames/ from OpenUp! is currently mapped to /names/common/. Documentation suggests using /commonNames/ while the type argument is always set to /name/common/. The OpenRefine Reconciliation Service API describes the type argument as

specifying the types of result e.g., person, product, ... The actual format of each type depends on the service (e.g., "Q515" as a Wikidata type)

For consistency the route will be set to /commonNames/ if no further reasons are given. A valid use case might be a shared path for the names service under /names/ leading to /names/common/ for common names and /names/scientific/ for scientific names. Another option is the implementation of both paths by using regular expressions in the @Path annotation (https://stackoverflow.com/a/17002237/7826291).

re1 commented 4 years ago

OpenUp! throws a BadRequest error if the type parameter is not /name/common. This does not help the user but might be closer to the original OpenRefine specs. Might need feedback.

re1 commented 4 years ago

Sources might be deprecated or unreachable. Web services a prioritized because they are easier to implement and usually more up to date.

re1 commented 4 years ago

Found some documentation for Artdatabanken's code and API.

re1 commented 4 years ago

JACQ internal common names are to be included as a data source. They are currently available from http://131.130.131.9/taxamatch/jsonRPC/json_rpc_taxamatchMdld.php using legacy code. This code will likely be migrated later.

The data source uses JSON-RPC and receives POST requests in the format of

{
  "id": 1,
  "method": "getMatchesService",
  "params": [
    "vienna",
    "Cynodon dactylon",
    {
      "includeCommonNames": true
    }
  ]
}

the result for this particular request looks like this (only one common name is included here):

{
    "id": 1,
    "result": {
        "error": "",
        "result": [
            {
                "searchtext": "Cynodon dactylon",
                "searchtextNearmatch": "",
                "rowsChecked": 33900,
                "type": "multi",
                "database": "freud",
                "includeCommonNames": true,
                "searchresult": [
                    {
                        "genus": "Cynodon",
                        "distance": "0",
                        "ratio": 1,
                        "taxon": "Cynodon Rich. (Poaceae)",
                        "ID": "12747",
                        "taxonID": "21179",
                        "family": "Poaceae",
                        "species": [
                            {
                                "name": "dactylon",
                                "distance": 0,
                                "ratio": 1,
                                "taxon": "Cynodon dactylon (L.) Pers.",
                                "taxonID": "1753",
                                "family": "Poaceae",
                                "syn": "",
                                "synID": 0,
                                "commonNames": [
                                    {
                                        "id": "11127",
                                        "name": "مرغ",
                                        "language": "fas",
                                        "geography": "Islamic Republic of Iran, Iran (independent political entity: country, state, region,...), (, Iran,IR, , 00)",
                                        "period": "recent",
                                        "reference": "Mozaffarian, V. (2007): 1-671; index."
                                    }
                                ]
                            }
                        ]
                    }
                ]
            }
        ]
    },
    "error": null
}

re1 commented 4 years ago

Meertens KNAW (Pland) returns either as a website or a serialized PHP array. Libraries to deserialize PHP in Java are rare and mpstly outdated. The most popular seems to be Pherialize which has not been updated in 6 years. Alternatively the parser could be written from scratch.

re1 commented 4 years ago

Both Artsdatabanken and Dyntaxa are providing SOAP endpoints which are hard to use and as it seems poorly documented too. Using WSDL files to generate code is recommended but adds a huge amount of complexity for an otherwise simple task.

re1 commented 4 years ago

Sources without endpoint are to be sourced exclusively from cache as stated in #15.

re1 commented 4 years ago

Dyntaxa endpoint was implemented as follows in OpenUp!: https://github.com/wkollernhm/openup/blob/master/protected/components/Sources/DyntaxaSe.php

re1 commented 4 years ago

Many static source tables already used in OpenUp! do not have a unique combination of columns to use for identification. In order to use those tables for JPA mapping an id column should be added:

alter table tbl_source_{table_name}
    add id int not null auto_increment primary key;

The text in braces {table_name} is only a placeholder for the name of the source table without its prefix (tbl_source_).

This change is required for the following OpenUp! tables:

tbl_source_azerbaijan
tbl_source_czech_jiri_bezo1
tbl_source_czech_jiri_roztoci
tbl_source_czech_jiri_vacnatci
tbl_source_hungarian_peregovits
tbl_source_linnaeus_projects
tbl_source_ukrainian_kobiv

Although these changes do not lead to possible data loss, it might be a good idea to backup the OpenUp! database before using its sources and caches directly.

mysqldump --user={user} --password={password} --host {host} {database} > {database}.sql

2020-05-27: The remote tables have been altered accordingly. In case of a static source update they will have to be updated again manually.

re1 commented 4 years ago

Regarding test it might be a good idea to make a list of query parameters to test specific common name sources. For static sources they can be looked up from the database. The following scientific names are found in multiple Web services:

Scientific name	Source IDs
`Eriophorum`	1, 2, 3, 8

re1 / jacq-javaee

[names] Match OpenUp! functionality #1