ropensci / taxa

taxonomic classes for R
https://docs.ropensci.org/taxa
Other
48 stars 12 forks source link

Add regexes for `database_list` #144

Open zachary-foster opened 6 years ago

zachary-foster commented 6 years ago
sckott commented 6 years ago

can you explain a bit more?

zachary-foster commented 6 years ago

Currently, database_list just has placeholders for the rergexes:

> database_list
$ncbi
<database> ncbi
  url: http://www.ncbi.nlm.nih.gov/taxonomy
  description: NCBI Taxonomy Database
  id regex: .*

$gbif
<database> gbif
  url: http://www.gbif.org/developer/species
  description: GBIF Taxonomic Backbone
  id regex: .*

$bold
<database> bold
  url: http://www.boldsystems.org
  description: Barcode of Life
  id regex: .*

$col
<database> col
  url: http://www.catalogueoflife.org
  description: Catalogue of Life
  id regex: .*

$eol
<database> eol
  url: http://eol.org
  description: Encyclopedia of Life
  id regex: .*

$nbn
<database> nbn
  url: https://nbn.org.uk
  description: UK National Biodiversity Network
  id regex: .*

$tps
<database> tps
  url: http://www.tropicos.org/
  description: Tropicos
  id regex: .*

$itis
<database> itis
  url: http://www.itis.gov
  description: Integrated Taxonomic Information System
  id regex: .*

We dont use database_list currently, but we probably will once some kind of taxon ID validity checking is added. So when a user does something like : taxon_id(id = "not valid", database = "ncbi") they get an error.

sckott commented 6 years ago

ah right, forgot about those

zachary-foster commented 6 years ago

What do you think about only using the taxon_database class for that list and forcing the user to pick a database name from that list instead of storing taxon_database objects? If they are using a custom database, they can either not specify a taxon database, or add their own taxon_database object to a local version of list.

It seems like up to three taxon_database objects per taxon objects (name, id, rank) would be a bit heavy on RAM, especially since 99% of the time they will all be the same database. In other instances where either a vector or object could be used, we decided (#139) to only allow objects for consistency. In this case, perhaps only allowing vectors might make more sense?

sckott commented 6 years ago

Yeah, it does make sense to try to reduce memory usage. agree that most cases will probably be each each of name, id and rank being from same database

zachary-foster commented 6 years ago

ok, so only store vectors for database in taxon_rank, taxon_id, and taxon_name objects then?

sckott commented 6 years ago

i think so, but the accessor methods/functions do construct a database object though ?