ropensci / taxa

taxonomic classes for R
https://docs.ropensci.org/taxa
Other
48 stars 12 forks source link

Add setters/accessors to basic classes #139

Open zachary-foster opened 6 years ago

zachary-foster commented 6 years ago

For

In

This would have prevented #138 and will make some other things easier.

zachary-foster commented 6 years ago

This seems like a good use of active bindings

zachary-foster commented 6 years ago

Hey @sckott, what do you think the accessors should return? Currently, I am using active bindings. I can think of 3 options (using the Taxon class for these examples):

  1. A character vector. for example my_taxon_obj$name returns "My tax name". This is what I think most users will expect and use most often, but it does not make it obvious how to manipulate the TaxonName object (e.g. the database field).
  2. Another object. for example my_taxon_obj$name returns a TaxonName object. This makes the most sense in the context of object-oriented programming, but the user would have to wrap the result in as.character to get the vector that will be used most often.
  3. Whatever is stored, either a character vector or an object. This is the easiest to implement, but will make other functions that use the accessors have to check which was returned, decreasing the value of the accessors.
sckott commented 6 years ago

i think it makes sense to store in a consistent way.

as what's returned, I lean towards No. 2, an object, but I haven't played with either to see what it feels like. Do you have a preference?

zachary-foster commented 6 years ago

i think it makes sense to store in a consistent way.

Yea, that would simplify things. The original reason for making vectors and objects interchangeable was to reduce memory usage in large data sets. For taxon objects:

> object.size(taxon("my tsdfsdffssdf"))
320 bytes
> object.size("my tsdfsdffssdf")
104 bytes

Even so, I am starting to lean to think the simplicity might be worth it.

Do you have a preference?

I think 2 would be good as long as there is a good way of getting the vector information for taxa objects, because I think thats what will be used most often.

I am experimenting with storing the objects with the my_ prefix and making active bindings to access/set vectors:

> (x <- taxon(
+     name = taxon_name("Poa annua"),
+     rank = taxon_rank("species"),
+     id = taxon_id(93036)
+ ))
<Taxon>
  name: Poa annua
  rank: species
  id: 93036
  authority: none
> x$name
[1] "Poa annua"
> x$rank
[1] "species"
> x$id
[1] "93036"
> x$my_name
<TaxonName> Poa annua
  database: none
> x$my_rank
<TaxonRank> species
  database: none
> x$my_id
<TaxonId> 93036
  database: none
> x$id <- "99999"
> x$my_id
<TaxonId> 99999
  database: none

Heres taxa:

> catus <- taxon(
+     name = taxon_name("catus"),
+     rank = taxon_rank("species"),
+     id = taxon_id(9685)
+ )
> tigris <- taxon(
+     name = taxon_name("tigris"),
+     rank = taxon_rank("species"),
+     id = taxon_id(9696)
+ )
> sapiens <- taxon(
+     name = taxon_name("sapiens"),
+     rank = taxon_rank("species"),
+     id = taxon_id(9606)
+ )
> (x <- taxa(catus, tigris, sapiens))
<taxa> 
  no. taxa:  3 
  catus / species / 9685 
  tigris / species / 9696 
  sapiens / species / 9606 
> x$names
[1] "catus"   "tigris"  "sapiens"
> x$names <- c("a", "b", "c")
> x
<taxa> 
  no. taxa:  3 
  a / species / 9685 
  b / species / 9696 
  c / species / 9606 
> x$values
[[1]]
<Taxon>
  name: a
  rank: species
  id: 9685
  authority: none

[[2]]
<Taxon>
  name: b
  rank: species
  id: 9696
  authority: none

[[3]]
<Taxon>
  name: c
  rank: species
  id: 9606
  authority: none

One drawback of using R6 instead of S3 is the object is not directly iterable (e.g. lapply(x, func) vs lapply(x$values, func)).

I am rather indecisive at the moment. I want the following to be possible with whatever system we end up going with:

What do you think about:

sckott commented 6 years ago

That plan sounds good @zachary-foster - i'll have a look at the changes as used in taxize dev branch to see if it creates any issues

sckott commented 5 years ago

@zachary-foster any idea of when this will get some attention?

zachary-foster commented 5 years ago

Hi Scott, yea, I will work on it in the next few days.

sckott commented 5 years ago

awesome, thanks

zachary-foster commented 5 years ago

Hi @sckott, I pushed some updates to the "eval" branch that have the new getter/setter/constructor code for TaxonName, TaxonRank, TaxonId, TaxonDatabase, Taxon, and Taxa. They all have S3 and R6 getter/setters. The S3 are wrappers for the R6. The S3 setters/constructors pass by value, like most R code, and the R6 setters/constructors pass by reference (if what they are passing is an R6 object).

Taxa also has all the standard indexing funcs like [<- and such.

The documentation has not been updated yet. Look at the new tests to see what can be done.

I am thinking about making Taxonomy inherit Taxa, so Taxonomy and Taxmap would get all of Taxa getters/setters.

Does all this seems reasonable? Any changes you would like to see?

sckott commented 5 years ago

Awesome, thanks. Will have a look soon, especially with an eye to see if taxize will require anything else

sckott commented 5 years ago

@zachary-foster will do next monday, leaving on vacation