Open re1 opened 4 years ago
/commonNames/
from OpenUp! is currently mapped to /names/common/
. Documentation suggests using /commonNames/
while the type argument is always set to /name/common/
. The OpenRefine Reconciliation Service API describes the type argument as
specifying the types of result e.g., person, product, ... The actual format of each type depends on the service (e.g., "Q515" as a Wikidata type)
For consistency the route will be set to /commonNames/
if no further reasons are given. A valid use case might be a shared path for the names service under /names/
leading to /names/common/
for common names and /names/scientific/
for scientific names.
Another option is the implementation of both paths by using regular expressions in the @Path
annotation (https://stackoverflow.com/a/17002237/7826291).
OpenUp! throws a BadRequest error if the type parameter is not /name/common
. This does not help the user but might be closer to the original OpenRefine specs. Might need feedback.
Sources might be deprecated or unreachable. Web services a prioritized because they are easier to implement and usually more up to date.
JACQ internal common names are to be included as a data source. They are currently available from http://131.130.131.9/taxamatch/jsonRPC/json_rpc_taxamatchMdld.php using legacy code. This code will likely be migrated later.
The data source uses JSON-RPC and receives POST requests in the format of
{
"id": 1,
"method": "getMatchesService",
"params": [
"vienna",
"Cynodon dactylon",
{
"includeCommonNames": true
}
]
}
the result for this particular request looks like this (only one common name is included here):
{
"id": 1,
"result": {
"error": "",
"result": [
{
"searchtext": "Cynodon dactylon",
"searchtextNearmatch": "",
"rowsChecked": 33900,
"type": "multi",
"database": "freud",
"includeCommonNames": true,
"searchresult": [
{
"genus": "Cynodon",
"distance": "0",
"ratio": 1,
"taxon": "Cynodon Rich. (Poaceae)",
"ID": "12747",
"taxonID": "21179",
"family": "Poaceae",
"species": [
{
"name": "dactylon",
"distance": 0,
"ratio": 1,
"taxon": "Cynodon dactylon (L.) Pers.",
"taxonID": "1753",
"family": "Poaceae",
"syn": "",
"synID": 0,
"commonNames": [
{
"id": "11127",
"name": "مرغ",
"language": "fas",
"geography": "Islamic Republic of Iran, Iran (independent political entity: country, state, region,...), (, Iran,IR, , 00)",
"period": "recent",
"reference": "Mozaffarian, V. (2007): 1-671; index."
}
]
}
]
}
]
}
]
},
"error": null
}
Meertens KNAW (Pland) returns either as a website or a serialized PHP array. Libraries to deserialize PHP in Java are rare and mpstly outdated. The most popular seems to be Pherialize which has not been updated in 6 years. Alternatively the parser could be written from scratch.
Both Artsdatabanken and Dyntaxa are providing SOAP endpoints which are hard to use and as it seems poorly documented too. Using WSDL files to generate code is recommended but adds a huge amount of complexity for an otherwise simple task.
Sources without endpoint are to be sourced exclusively from cache as stated in #15.
Dyntaxa endpoint was implemented as follows in OpenUp!: https://github.com/wkollernhm/openup/blob/master/protected/components/Sources/DyntaxaSe.php
Many static source tables already used in OpenUp! do not have a unique combination of columns to use for identification. In order to use those tables for JPA mapping an id
column should be added:
alter table tbl_source_{table_name}
add id int not null auto_increment primary key;
The text in braces {table_name}
is only a placeholder for the name of the source table without its prefix (tbl_source_
).
This change is required for the following OpenUp! tables:
tbl_source_azerbaijan
tbl_source_czech_jiri_bezo1
tbl_source_czech_jiri_roztoci
tbl_source_czech_jiri_vacnatci
tbl_source_hungarian_peregovits
tbl_source_linnaeus_projects
tbl_source_ukrainian_kobiv
Although these changes do not lead to possible data loss, it might be a good idea to backup the OpenUp! database before using its sources and caches directly.
mysqldump --user={user} --password={password} --host {host} {database} > {database}.sql
2020-05-27: The remote tables have been altered accordingly. In case of a static source update they will have to be updated again manually.
Regarding test it might be a good idea to make a list of query parameters to test specific common name sources. For static sources they can be looked up from the database. The following scientific names are found in multiple Web services:
Scientific name | Source IDs |
---|---|
Eriophorum |
1, 2, 3, 8 |
defaultTypes