ropensci / software-review

rOpenSci Software Peer Review.
286 stars 104 forks source link

taxlist, a package for structuring taxonomic lists and related information #233

Closed kamapu closed 3 years ago

kamapu commented 6 years ago

Summary

The taxlist package structures taxonomic information into S4 objects and implements methods for the manipulation of contained information. Such objects may or may not contain information on synonymy, taxonomic ranks, parent-child relations, taxon views (references used to establish relation between taxon usage names and taxon concepts), and taxon (functional) traits.

Package: taxlist
Version: 0.1.5
Encoding: UTF-8
Date: 2018-06-18
Title: Handling Taxonomic Lists
Authors@R:
    person("Miguel", "Alvarez", email="kamapu78@gmail.com", role=c("aut", "cre"))
Depends:
    R(>= 3.0.0),
    stats,
    utils
Imports:
    foreign,
    grDevices,
    methods,
    taxize,
    stringdist,
    vegdata
Suggests:
    ape,
    devtools,
    knitr,
    stringi,
    taxa,
    rmarkdown
LazyData: true
Description: Handling taxonomic lists through objects of class 'taxlist'.
    This package provides functions to import species lists from 'Turboveg'
    (<https://www.synbiosys.alterra.nl/turboveg>) and the possibility to create
    backups from resulting R-objects.
    Also quick displays are implemented as summary-methods.
License: GPL (>= 2)
URL:
    https://cran.r-project.org/package=taxlist,
    https://github.com/kamapu/taxlist
BugReports: https://github.com/kamapu/taxlist/issues
Collate:
    'NULLing.R''auxiliary_functions.R''deprecated-functions.R''dissect_name.R'
    'taxlist-class.R''clean.R''as.list.R''taxon_views.R''add_view.R'
    'taxon_names.R''taxon_relations.R''taxon_traits.R'
    'levels.R''add_concept.R''update_concept.R''add_synonym.R'
    'accepted_name.R''synonyms.R''basionym.R''update_name.R''delete_name.R'
    'replace_view.R''get_children.R'
    'change_concept.R''Extract.R''subset.R'
    'merge_taxa.R''backup_object.R''load_last.R''summary.R'
    'df2taxlist.R''tv2taxlist.R''tnrs.R''tax2traits.R''match_names.R''print_name.R'
    'StartMessage.R'
VignetteBuilder: knitr

https://github.com/kamapu/taxlist

Reproducibility, because this package makes taxonomic information available in a quasi-standard format and tests inconsistencies on the content of taxonomic lists.

In general to taxonomists and biodiversity scientists, in particular to vegetation ecologists (taxlist objects are implemented in the package vegtable).

While its functionality may overlap the package taxa, the package taxlist attempts to be flexible in the degree of completeness of data (incompleteness is very frequent in vegetation-plot databases), it is meant to be integrated in objects containing diversity information (as in the mentioned package vegtable) and to import data from local storage (spreadsheets, Turboveg data sets and even PostgreSQL tables by using vegtable2).

Requirements

Confirm each of the following by checking the box. This package:

Publication options

Detail

@arendsee @zachary-foster @sckott

maelle commented 6 years ago

Thanks for your submission @kamapu, we're discussing and will get back to you soon.

maelle commented 6 years ago

:wave: again @kamapu, it's a bit of a tricky situation and we're going to need more information about your package before making a decision.

The general question is: how would an R user with a taxonomy problem at hand know how to choose one of the two packages?

Your answer could be a table as in this pre-submission inquiry or a decision tree. I know it's a bit of work that might not even warrant onboarding, but it'd be something useful to have in the docs of your package (and of taxa probably) since the existence of both packages even not under the rOpenSci's umbrella mean users might have to make a decision.

Thanks for your patience!

maelle commented 5 years ago

:wave: @kamapu! Any update or answer on the above? Thank you!

maelle commented 5 years ago

πŸ‘‹ @kamapu! Any update or answer on the above? Thank you!

kamapu commented 5 years ago

Dear @maelle, there are not really updates on this issue. At the moment I am for a longer period in the field (Kenya). Though, it will be difficult for me to really document differences between taxlist and taxa. I only can tell in favour of taxlist that this package is implemented in vegtable, which is meant as a container for vegetation-plot databases in R. In fact, taxlist development runs in parallel to vegtable but they are kept in separated packages due to the complexity of taxonomic structures and the potential integration of taxonomic lists in other object classes.

Now my question is: Can I submit two packages as a bulk in ROpenSci or should I leave taxlist and try to submit vegtable alone?

Best regards!

maelle commented 5 years ago

One (pre-)submission by package is best.

Have a productive time in the field!

kamapu commented 5 years ago

Dear @maelle:

I went through most of the functions in the manual of taxa to have a better impression about differences between this package and taxlist. I have to recognize, it has been hard for me to understand the way how taxa deals with taxonomic information, since its programming style is a higher level than my primitive skills. Thus, the following comments should be interpreted as impressions rather than truth.

Basically both mentioned packages fulfil similar tasks but use different approaches, taxa uses a function-oriented approach (R6), while taxlist uses a data-oriented one (S4). Both packages attempt to provide an object class containing taxonomic information, but taxlist is strongly focusing on potential applications in vegetation-plot databases. taxlist allows diverse degrees of completeness, starting from just listing of taxonomic entities up to data sets including taxonomic ranks, synonyms, ecological traits (or further taxon attributes), and taxon views.

Contributions of taxlist

Further aspects are probably not implemented in taxa and can be considered as the contribution of taxlist to ROpenSci:

Features in Development

Additional features in development, which may not be yet included in taxa are:

Final Remarks

I hope the previous comments properly support taxlist as a package rather complementary to taxa than redundant and make it eligible for ROpenSci. Disregarding the final decision, a way to make data sets exchangeable among those two applications should be strongly recommended.

zachary-foster commented 5 years ago

Thanks for looking through the docs/code @kamapu! Here are some thoughts:

While taxa is importing information from several on-line databases, taxlist offers alternatives of import data from local data sets.

taxa can read local datasets using parse_tax_data, but it does not have any parsers for specific local formats, since we wanted to keep everything generic.

The consistency of information included in taxonomic lists is "cross-checked" by validity in S4 objects.

Yep, taxa uses R6 classes which are not strong typed, so users can break the objects if they set fields manually instead of using our dplyr-like manipulation functions.

I don't see any mentions to synonymy in taxa, which is a crucial feature in taxlist

There is no such feature explicitly and we dont have plans for one at the moment

A plotting function displaying relations between taxon as dendrograms

taxa uses print_tree for this

A function exporting the content of taxlist objects into the Veg-X standard

I have no plans to support import or export of specific file formats in taxa, since it is primarily a data manipulation standard.

sckott commented 5 years ago

thanks @kamapu for the reply - we're discussing now

sckott commented 5 years ago

@kamapu We've decided to proceed. We do feel it is important though to have functions to convert between the two packages major data structures/classes though, in both packages if possible. your editor @maelle will follow up with more information.

kamapu commented 5 years ago

@sckott These are great news! I will put some effort to produce the required functions though I may require the support of @zachary-foster.

maelle commented 5 years ago

:wave: @kamapu! I think it'd be great to add the conversion functions before the onboarding process. I'll put this submission on hold, but in the meantime, feel free to ask any questions.

kamapu commented 5 years ago

OK, I hope, there is no hurry (I'm already busy with the closure of 2018). You may not wander, if I use the gap between Christmas and New Year for it :anguished:

maelle commented 5 years ago

No problem, I might ping you once in a while but that's fine. Please do join the Slack as soon as you want (I sent you an invite).

maelle commented 5 years ago

:wave: @kamapu! Happy New Year! Any update?

kamapu commented 5 years ago

@maelle Happy New Year! I though, the day will come: I had no time but ways thinking about every day. Now I'll start the discussion with @zachary-foster

kamapu commented 5 years ago

The discussion can be followed here

maelle commented 5 years ago

:wave: @kamapu, any update? Note that I'll be unavailable from sometime between the beginning of and mid-June, until mid-October, so your editor would be @noamross after that point.

kamapu commented 5 years ago

Hello @maelle and @noamross There is one recent step here We are working in a function for converting taxlist objects into Taxmap.

kamapu commented 5 years ago

πŸ‘‹ @noamross After a while, we managed to write a function exporting and importing taxlist objects to taxmap. That is to say, objects could move from taxlist to taxa and back (see here). Thus, the submission can proceed.

noamross commented 5 years ago

Thank you @kamapu for the update! Before assigning reviewers, we still need the package to have a test suite, with test coverage reporting of at least 75%.

kamapu commented 4 years ago

Dear @noamross Do you mean, I should follow this instructions? I'm a self-made programmer and now I'm feeling like someone, who is driving without driving license

noamross commented 4 years ago

I'm a self-made programmer and now I'm feeling like someone, who is driving without driving license

As are most of us! Yes, that's where to start for writing tests.

maelle commented 4 years ago

:wave: @kamapu! Any progress on the tests? Any help needed? πŸ™‚

kamapu commented 4 years ago

Uff! Time is not at my side. I'll give a try and come back to you.

kamapu commented 4 years ago

@maelle I managed to implement the test suite. At the moment taxlist has the requested 75% of code coverage. I'll try to improve it but I though, we can proceed with the submission process. I hope, this was the hardest part (everything was new for me).

maelle commented 4 years ago

:wave: @kamapu! Great news! Congrats on adding the tests, I imagine it wasn't easy but you've learnt something very useful. :smile_cat:

Editor checks:


Editor comments

── 1. Error: (unknown) (@test-taxlist2taxmap.R
attempt to apply non-function
1: taxlist2taxmap(data1) at testthat/test-taxlist2taxmap.R:9
2: taxlist2taxmap(data1)

══ testthat results  ═════════════════════════
[ OK: 91 | SKIPPED: 0 | WARNINGS: 0 | FAILED: 1 ]
1. Error: (unknown) (@test-taxlist2taxmap.R#9) 
 ── GP taxlist ─────────────────────────────────────────────────────

 It is good practice to

 βœ– not use "Depends" in DESCRIPTION, as it can
 cause name clashes, and poor interaction with other
 packages. Use "Imports" instead.

 βœ– avoid long code lines, it is bad for
 readability. Also, many people prefer editor windows
 that are about 80 characters wide. Try make your lines
 shorter than 80 characters

 R/accepted_name.R:54:1
 R/add_synonym.R:16:1
 R/backup_object.R:32:1
 R/basionym.R:48:1
 R/clean.R:43:1
 ... and 36 more lines

 βœ– avoid sapply(), it is not type safe. It might
 return a vector, or a list, depending on the input
 data. Consider using vapply() instead.

 R/dissect_name.R:8:9
 R/dissect_name.R:10:9

 βœ– avoid 1:length(...), 1:nrow(...),
 1:ncol(...), 1:NROW(...) and 1:NCOL(...) expressions.
 They are error prone and result 1:0 if the expression
 on the right hand side is zero. Use seq_len() or
 seq_along() instead.

 R/add_concept.R:73:42
 R/df2taxlist.R:76:36
 R/df2taxlist.R:76:64
 R/load_last.R:35:17
 R/match_names.R:20:27
 ... and 7 more lines

 βœ– not import packages as a whole, as this can
 cause name clashes between the imported packages.
 Instead, import only the specific functions you need.

A few things to fix, do not hesitate to ask for help via this thread/discuss.ropensci.org!

Reviewers: @mcsiple @levisc8 Due date: 2020-06-22

maelle commented 4 years ago

Any update, @kamapu? πŸ™‚

kamapu commented 4 years ago

Still thinking about the best way to prepare a proper answer.

kamapu commented 4 years ago

Dear @maelle you are right, I have learnt a lot and got some new skills for programming packages.

Although I still have to do the comparison with taxa in README, I respond herewith to the rest of your comments in advance.

  • Is taxlist aimed at being mainly developer facing i.e. used as a dependency of other packages?

Not and yes. The package taxlist is handling taxonomic information within objects defined by the package vegtable, the later handling biodiversity records. Nevertheless the users need to know objects and functions defined in taxlist to exploit the capabilities of both packages.

  • I'm getting an error when running the tests.

This is due to the function taxlist2taxmap(), which is depending on the current version of taxa at GitHub but is not working with the last version at CRAN. I requested @zachary-foster to update the CRAN version (see here). As soon as taxa gets updated, this error should disappear.

  • goodpractice output

All issues have been solved with exception of the second one ("avoid long code lines, it is bad for readability"). I am also working with an editor of 80 characters wide and use to break code for readability. The problem is caused by strings included in stop() and warning() statements, which I prefer not to break. Since those lines use to be within functions and therefore indented, the width limit is exceeded. I hope this issue can be tolerated, otherwise I'll need to suppress the indentations in those specific lines.

Required changes have been committed to the master branch.

Pending tasks are therefore:

maelle commented 4 years ago

Thank you for your work and answer!

*

This is due to the function taxlist2taxmap(), which is depending on the current version of taxa at GitHub but is not working with the last version at CRAN.

In the meantime please use a Remotes field in DESCRIPTION cf https://remotes.r-lib.org/articles/dependencies.html#github

maelle commented 4 years ago

:wave: @kamapu! Any update? Happy New Year. :slightly_smiling_face:

kamapu commented 4 years ago

Happy new year @maelle I answer late, because I'm still working on it and was waiting to have good news for you:

1) Long code was solved by skipping some space from indentation and using paste() for too long messages. 2) Coverage have been also improved a bit, but I am still struggling with some issues on this regard (see questions at the end). 3) Since I have to submit a new version of taxlist to CRAN, I had to remove the functions taxlist2taxmap() and taxmap2taxlist() from master and will reinsert them, once I get news about a new CRAN version of taxa, which is compatible with those functions.

I try to finish with the file README.md and come back to you. In the meantime some questions:

1) I don't have any clue about how to test writing functions, specifically in the case of backup_object(), which is writing a .rda file. I was not able to find help on Internet (some advices were not comprehensive for my background). 2) The same is valid for functions printing in the console, for instance taxlist::summary().

Some hints for me?

maelle commented 4 years ago

Thanks for the updates!

I am sorry, I've just noticed you do not use roxygen2 for producing documentation, which is a requirement before review. I haven't done the conversion from Rd to roxygen2 myself but you might find the rd2roxygen package useful. When using roxygen2 you won't have to updated NAMESPACE by hand. See e.g. the chapter about man/ from the R Packages book and the chapter about NAMESPACE. I am really sorry for not seeing that earlier, but that change will be for the best.

Reg your questions

  1. Not exactly sure, you could save one RDA and in the test compare the new RDA you produce to the existing one?
  2. I think you need to use expect_output()
maelle commented 4 years ago

I'd even recommend using Markdown formatting for roxygen2 so you might need to run roxygen2md after Rd2roxygen. :-) (that part is not a requirement, I just think it's easier to use Markdown formatting later, but you can disagree).

kamapu commented 4 years ago

Thank you, for your prompt answer. I just liked to announce you about the last version uploaded to master, to release me from the work, but I see, it is a never ending story...

I may have overseen the implementation of roxygen in the source. I have read about it but never given a try, thus again, something new for me and more time to get there...

Regarding the test on writing function, were will be the file get written? in the active working directory, meaning the main folder of the package source? In a "virtual" (temporary) working directory?

maelle commented 4 years ago

You won't regret learning about roxygen2, for this package and future packages. :wink: It's fine if it takes time, I can imagine you're busy.

The reference file would be in the test/ folder, and the one created during test in a temporary directory. You can use the same code as in this helper file that'll create a temporary directory and delete it once the test is run. the check=TRUE argument of tempdir() was introduced in R 3.5.0.

kamapu commented 4 years ago

Dear @maelle I managed to "roxygenize" the package. You are right, it makes a lot of sense to implement it, though it has been also very helpful to have the experience doing documentation by hand. What is next? :wink:

maelle commented 4 years ago

Awesome, congrats! I too first learned to write docs by hand but am not sure I wouldn't have preferred to be shown roxygen2 first 😁

A remaining point was "Insert comparison withΒ taxaΒ in README", have you made progress on that?

kamapu commented 4 years ago

I tried but using a different approach then the one suggested: 1) The first paragraph in the chapter Similar Packages summarizes the differences between taxlist and taxa with a link to the detailed discussion. 2) I also inserted two chapters called Rmarkdown Integration and Descriptive Statistics that do not directly mention taxa but attempt to highlight special features of taxlist, which may not be considered in any other package dealing with taxonomic information.

Please, let me know if this is OK so or if some additions are required.

I also remind you, that the functions written to transform taxlist objects into taxmap and vice versa were removed from the master branch since they are not working with the current CRAN version of taxa. I will reinsert those functions once taxa is accordingly updated.

maelle commented 4 years ago

Thank you. I'm not sure it's sufficient to answer the question "how would an R user with a taxonomy problem at hand know how to choose one of the two packages?", especially as it'd demand a potential user to read and digest information from a GitHub issue thread. We were especially hoping both packages would have the same summary/info in their README. @sckott @zachary-foster could you please comment on that + on when taxa will be updated on CRAN? Thanks all.

Furthermore, @kamapu, to help my understanding and my reviewer search when time comes, feel free to point me to use cases by other people. :-)

zachary-foster commented 4 years ago

"how would an R user with a taxonomy problem at hand know how to choose one of the two packages?" ... We were especially hoping both packages would have the same summary/info in their README.

I am not really sure to be honest. Its been a bit since I tried out taxlist. I might have to revisit it to answer that. @kamapu can correct me if I am wrong, but perhaps the main difference is that taxa is targeted towards developers that want to make an R package that uses taxonomic data, but does not want to make classes and manipulation functions from scratch. taxa is like tibble + dplyr for taxonomic data. Non-developer users would interact with the classes and functions defined by taxa, but would do so in the context of another package, much like tibbles are used in packages besides tibble and users of tibbles might not even know which package defines them.

on when taxa will be updated on CRAN?

I will try to get an update out this week

noamross commented 4 years ago

⚠️⚠️⚠️⚠️⚠️

In the interest of reducing load on reviewers and editors as we manage the COVID-19 crisis, rOpenSci is temporarily pausing new submissions for software peer review for 30 days (and possibly longer). Please check back here again after 17 April for updates.

In this period new submissions will not be handled, nor new reviewers assigned. Reviews and responses to reviews will be handled on a 'best effort' basis, but no follow-up reminders will be sent.

Other rOpenSci community activities continue. We express our continued great appreciation for the work of our authors and reviewers. Stay healthy and take care of one other.

The rOpenSci Editorial Board

⚠️⚠️⚠️⚠️⚠️

kamapu commented 4 years ago

@maelle I managed to produce a new version of taxlist, which may consider all tasks requested for its submission to ROpenSci.

On the later case, I wrote an itemized list in README, chapter Similar Packages. Note that since I am not really working much with taxa, I'm not able to provide a neutral comparison between the two packages. The same is valid for the developers of taxa in the other direction. Thus I rather provide examples of applications, where the users may prefer or have to use taxlist instead of taxa.

maelle commented 4 years ago

Thanks @kamapu, it looks great, but I'll hold off looking for reviewers now as we extended the pause mentioned earlier until at least May the 7th. Thanks for your understanding!

kamapu commented 4 years ago

@maelle Should I "freeze" the master branch or can I still do changes?

maelle commented 4 years ago

You can still do changes, just don't decrease the code coverage now that it's so good πŸ˜‰ We'll update threads when we are back to normal operation.

annakrystalli commented 4 years ago

⚠️⚠️⚠️⚠️⚠️ In the interest of reducing load on reviewers and editors as we manage the
COVID-19 crisis, rOpenSci new submissions for software peer review are paused.

In this period new submissions will not be handled, nor new reviewers assigned.
Reviews and responses to reviews will be handled on a 'best effort' basis, but
no follow-up reminders will be sent. Other rOpenSci community activities continue.

Please check back here again after 25 May when we will be announcing plans to slowly start back up.

We express our continued great
appreciation for the work of our authors and reviewers. Stay healthy and take
care of one other.

The rOpenSci Editorial Board ⚠️⚠️⚠️⚠️⚠️

maelle commented 4 years ago

@kamapu we're back! Anything I should know (features planned or other things you want to tackle before review) before I start looking for reviewers?

kamapu commented 4 years ago

Welcome back! This are good news and a signal that a piece of life is returning to normality. I have some problems with CRAN because the encoding of example data (Encoding is becoming my worse nightmare). I may solve it by the end of the week and produce a new release.