ropensci / taxa

taxonomic classes for R
https://docs.ropensci.org/taxa
Other
48 stars 12 forks source link

Relationship with `taxlist` #130

Open arendsee opened 6 years ago

arendsee commented 6 years ago

Just curious, what is the relationship between taxa and kamapu/taxlist? The focus of the two packages seems somewhat different. Perhaps we could say taxa is focused on beta taxonomy and taxlist on alpha taxonomy. Is there any potential for connecting the two packages (e.g., using taxlist inside taxa or adding functions to convert between the two systems)?

I'm currently not involved in either project (apart from a role as a public reviewer), but I may need a library for handling taxonomy in the near future and I don't like making choices.

sckott commented 6 years ago

Good question @arendsee

I still have not thought much about how the two compare. A few thoughts after having a quick look:

@zachary-foster thoughts?

zachary-foster commented 6 years ago

what is the relationship between taxa and kamapu/taxlist?

None at the moment. I have never heard of it before, but there is a lot I have never heard of so that does not mean much.

Perhaps we could say taxa is focused on beta taxonomy and taxlist on alpha taxonomy.

I might see what you mean. The taxmap and taxonomy classes taxa implements are focused on a whole "taxonomy" with or without explicit rank information. The hierarchy and taxon classes of taxa are more like the taxlist classes: They are independent taxa. The taxlist object seems to be able to store hierarchical data, but associated with one taxon (all the varieties of a fern species and its genus are associated with "fern"), which is a different way of thinking about it than I am used to. Its like the unit is "organism" rather than "taxon", so an organism (e.g. fern) can have taxonomic attributes, that may be hierarchical. Whereas in taxa, specific taxa might have associations with organism data (e.g. The genus Asplenium could be associated with the common name "fern" and so is the species obliquum by implication since it is a subtaxon ofAsplenium ).

Is there any potential for connecting the two packages (e.g., using taxlist inside taxa or adding functions to convert between the two systems)?

Hmm, not sure. From reading the vignette, the class concepts don't really mesh well (organism with taxonomic data vs taxa with data). The low level classes in taxa are more low level than the taxlist class and taxa is more object-oriented than taxlist from what I can tell, so there is not much of a reason to use taxlist in taxa or visa versa (considering taxlist is S4).

I could write a converter between the two if there is demand for it.

What do you think @kamapu?

kamapu commented 6 years ago

well, it is not long time ago I realized the existence of taxa and I was a bit disappointed, since I was planing to submit taxlist to rOpenSci. My impression is that both, taxlist and taxa were aiming to the same task but using different approaches. This makes taxlist not eligible for rOpenSci, as I understand it.

To the general discussion I add some bullets in defense of taxlist:

@arendsee I'm sorry, I'm not really answering your questions but I did not had the opportunity to compare in detail both packages. @zachary-foster Thank you for including me in the discussion. @sckott I won't be mad getting a function that converts taxa objects into taxlist and vice-versa, though I'm not completely recovered from the chock.

zachary-foster commented 6 years ago

well, it is not long time ago I realized the existence of taxa and I was a bit disappointed, since I was planing to submit taxlist to rOpenSci. My impression is that both, taxlist and taxa were aiming to the same task but using different approaches. This makes taxlist not eligible for rOpenSci, as I understand it.

I am sorry to hear that. That kind of thing is always frustrating. Its a shame we did not talk before our packages were pretty much mature, otherwise we might have merged our efforts and not done redundant work. Perhaps @sckott has more to add regarding rOpenSci policy

It is not true that taxlist is handling organisms with taxonomic attributes...

Sorry for misunderstanding. I looked into vegtable to get an idea of how taxlist is used and found the following taxlist in it:

> head(Kenya_veg@species@taxonNames)
   TaxonUsageID LETTERCODE            SHORTNAME            TaxonName NATIVENAME           AuthorName SYNONYM TaxonConceptID
4             3    ABUTMAU Abutilon mauritianum Abutilon mauritianum       <NA>       (Jacq.) Medik.   FALSE              3
5         50361    ABUTMAU       Pavonia patens       Pavonia patens       <NA>     (Andrews) Chiov.    TRUE              3
6             4    ACACDRE Acacia drepanolobium Acacia drepanolobium       <NA> Harms ex Y. Sjöstedt   FALSE              4
7             5    ACACELA       Acacia elatior       Acacia elatior       <NA>               Brenan   FALSE              5
10            8    ACACMEL     Acacia mellifera     Acacia mellifera       <NA>        (Vahl) Benth.   FALSE              8
11            9    ACACPOL   Acacia polyacantha   Acacia polyacantha       <NA>               Willd.   FALSE              9
> head(Kenya_veg@species@taxonRelations)
   TaxonConceptID AcceptedName Basionym Parent Level ViewID
4               3            3       NA     NA    NA      1
6               4            4       NA     NA    NA      1
7               5            5       NA     NA    NA      1
10              8            8       NA     NA    NA      1
11              9            9       NA     NA    NA      1
12             10           10       NA     NA    NA      1
> head(Kenya_veg@species@taxonViews)
        ViewID   Author Year Title Published
sp_list      1 Easplist   NA    NA        NA
> head(Kenya_veg@species@taxonTraits)
   TaxonConceptID    GENUS      FAMILY
3               3 Abutilon   Malvaceae
4               4   Acacia Leguminosae
5               5   Acacia Leguminosae
8               8   Acacia Leguminosae
9               9   Acacia Leguminosae
10             10   Acacia Leguminosae

I though S4 is a more object-oriented programming than S3. So, why I read the opposite opinion here?

We are actually using R6 and the S3 is just a thin surface layer to make things familiar to more people. For example, our filter_taxa function can be called like filter_taxa(obj, ...) or like obj$filter_taxa(...) (The R6 way). Perhaps I should have said "modular" rather than "object-oriented" since both are object-oriented. That might not be true either; just based on my understanding so far, which is limited.

It is also important to document the source for circumscription of taxa (taxon views in slot taxonViews).

Interesting. So this slot documents who said that a grouping of taxa belong together? Are contradicting views possible. i.e. can the same dataset be classified by multiple trees in one object? Is this different than assigning an "authority" on a coarse taxonomic rank like family?

It looks like the taxlist class is most similar to to our taxmap class (assuming Kenya_veg@species is a good example of taxlist) except that:

kamapu commented 6 years ago

The data set Kenya_veg is a bit outdated. To be honest, a bad example for a consolidated database, on the other side the common case of databases imported from Turboveg. It will better to look at Easplist in taxlist.

Are the slots "taxonNames" "taxonRelations" "taxonViews" "taxonTraits" always present, if empty?

Yes by definition of the class. Though "taxonViews" and "taxonTraits" may be empty (data frames with no rows). Check the prototype using new("taxlist")

Are others possible?

There is the Inheritance, meaning that you can define a new class inheriting taxlist properties but adding new slots. I have not yet tested such option.

What is the difference between "TaxonUsageID" and "TaxonConceptID"?

The first is the ID of the taxon usage names and the second is the ID of the taxon. So, the accepted name and respective synonyms for a taxon will have own "TaxonUsageID"s but share the same "TaxonConceptID".

It looks like "taxonRelations" can define a tree structure using the "TaxonConceptID" and "Parent" (which I assume stores TaxonConceptIDs)?

Yes, this column it is pointing to TaxonConceptID. BUT there is also the column "Level" which may be a factor variable (classes ordered bottom-up). The levels are custom-defined.

Does "taxonTraits" always store rank info, or are the columns arbitrary?

If the taxonomic information is already contained in "taxonRelations", there is no necessity to include it in "taxonTraits". BUT if you like to produce some statistics regarding taxonomy, especially once working in vegtable (e.g. number of species for different families within a plot observation), you may need to transfer this information to traits by using the function tax2traits (see the help for this function).

Are contradicting views possible.?

Not yet but it is in the TODO list.

Just a last comment regarding taxon views. We took this idea from the work of Jansen and Dengler(2010) and cited publications. There is an example about why the taxon view for a combination matters.

One manuscript about taxlist is under review. I can share it with you once we get some news from the journal.

zachary-foster commented 6 years ago

Thanks for all the clarifications! That helps me understand taxlist much better.

One manuscript about taxlist is under review. I can share it with you once we get some news from the journal.

Cool! I would like to see it. We are actually submitting a paper on taxa in a few days to F1000, so it should be available there soon.

Aside: Whenever I tell people I am working on a standard for taxonomic data in R, I think of this:

image

sckott commented 6 years ago

Love that xkcd


wrt taxlist submission: We do have some pkgs in ropensci that are somewhat overlapping, but usually not this close as taxa and taxlist are. Our editorial board would have to discuss

kamapu commented 6 years ago

I'll out of office for around 2 months but after that, we should think about compatibility between taxa and taxlist, especially regarding functions to convert data from one to another object class.

zachary-foster commented 6 years ago

Sounds good @kamapu! From what I have seen, it should not be too difficult.

kamapu commented 6 years ago

Well, I didn't really worked on export functions, up to now, but I am still considering it: perhaps I should play a bit with taxa. In the meantime the article is on-line and some new features are available at kamapu/taxlist. May somebody of you be willing to contribute?

zachary-foster commented 6 years ago

Hello @kamapu, sorry for the delay.

I have not worked on it either, but its on my mental todo list. Which package do you think these conversion functions should be in, taxa or taxlist, or should one conversion be in each one (e.g. taxa has as.taxmap and taxlist has as.taxlist or visa versa)?

Either way is fine with me and I can help writing the conversion functions.

Thanks for the link to the article. I will read it. Our article is also no online in case you are interested: https://f1000research.com/articles/7-272/v1

kamapu commented 6 years ago

Hello @zachary-foster: Great to see your message and the new publication. Regarding the function, I will prefer the last option: an import function in each package. Since I only have experience working alone in GitHub, it will be interesting to see how collaborative projects run.

kamapu commented 4 years ago

Dear @arendsee We are discussing in #233 (submission to ROpenSci) about writing some information for users about differences between taxa and taxlist. Thus I like to kindly ask you, which was your own decision on this respect (to use one or another package) and why.