ropensci / software-review

rOpenSci Software Peer Review.
287 stars 104 forks source link

taxlist, a package for structuring taxonomic lists and related information #233

Closed kamapu closed 3 years ago

kamapu commented 6 years ago

Summary

The taxlist package structures taxonomic information into S4 objects and implements methods for the manipulation of contained information. Such objects may or may not contain information on synonymy, taxonomic ranks, parent-child relations, taxon views (references used to establish relation between taxon usage names and taxon concepts), and taxon (functional) traits.

Package: taxlist
Version: 0.1.5
Encoding: UTF-8
Date: 2018-06-18
Title: Handling Taxonomic Lists
Authors@R:
    person("Miguel", "Alvarez", email="kamapu78@gmail.com", role=c("aut", "cre"))
Depends:
    R(>= 3.0.0),
    stats,
    utils
Imports:
    foreign,
    grDevices,
    methods,
    taxize,
    stringdist,
    vegdata
Suggests:
    ape,
    devtools,
    knitr,
    stringi,
    taxa,
    rmarkdown
LazyData: true
Description: Handling taxonomic lists through objects of class 'taxlist'.
    This package provides functions to import species lists from 'Turboveg'
    (<https://www.synbiosys.alterra.nl/turboveg>) and the possibility to create
    backups from resulting R-objects.
    Also quick displays are implemented as summary-methods.
License: GPL (>= 2)
URL:
    https://cran.r-project.org/package=taxlist,
    https://github.com/kamapu/taxlist
BugReports: https://github.com/kamapu/taxlist/issues
Collate:
    'NULLing.R''auxiliary_functions.R''deprecated-functions.R''dissect_name.R'
    'taxlist-class.R''clean.R''as.list.R''taxon_views.R''add_view.R'
    'taxon_names.R''taxon_relations.R''taxon_traits.R'
    'levels.R''add_concept.R''update_concept.R''add_synonym.R'
    'accepted_name.R''synonyms.R''basionym.R''update_name.R''delete_name.R'
    'replace_view.R''get_children.R'
    'change_concept.R''Extract.R''subset.R'
    'merge_taxa.R''backup_object.R''load_last.R''summary.R'
    'df2taxlist.R''tv2taxlist.R''tnrs.R''tax2traits.R''match_names.R''print_name.R'
    'StartMessage.R'
VignetteBuilder: knitr

https://github.com/kamapu/taxlist

Reproducibility, because this package makes taxonomic information available in a quasi-standard format and tests inconsistencies on the content of taxonomic lists.

In general to taxonomists and biodiversity scientists, in particular to vegetation ecologists (taxlist objects are implemented in the package vegtable).

While its functionality may overlap the package taxa, the package taxlist attempts to be flexible in the degree of completeness of data (incompleteness is very frequent in vegetation-plot databases), it is meant to be integrated in objects containing diversity information (as in the mentioned package vegtable) and to import data from local storage (spreadsheets, Turboveg data sets and even PostgreSQL tables by using vegtable2).

Requirements

Confirm each of the following by checking the box. This package:

Publication options

Detail

@arendsee @zachary-foster @sckott

maelle commented 4 years ago

Ok, I'll wait until then! You can ask questions on rOpenSci forum if needed. Good luck! (and yeah encoding is hard!)

kamapu commented 4 years ago

The issue with encoding was solved in the last commit, @maelle Now we can proceed.

maelle commented 4 years ago

Thanks!

Is https://github.com/kamapu/taxlist/blob/master/TODO.md out-of-date? If so could you please remove it to not distract reviewers?

maelle commented 4 years ago

I'm now looking for reviewers.

Please add a peer-review badge to the package README: [![](https://badges.ropensci.org/233_status.svg)](https://github.com/ropensci/software-review/issues/233)

maelle commented 4 years ago

Reviewers assigned! Thanks a lot @mcsiple & @levisc8 for agreeing to review. :smile_cat: Your reviews are due on 2020-06-22.

As a reminder here are links to the reviewer guide and review template.

levisc8 commented 4 years ago

Hi @maelle, @kamapu, and @zachary-foster, hope you're all doing well! Thanks for this excellent package and the opportunity to review it! This is my first rOpenSci review, so please let me know if I missed anything.

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

Documentation

The package includes all the following forms of documentation:

For packages co-submitting to JOSS

The package contains a paper.md matching JOSS's requirements with:

  • [ ] A short summary describing the high-level functionality of the software
  • [ ] Authors: A list of authors with their affiliations
  • [ ] A statement of need clearly stating problems the software is designed to solve and its target audience.
  • [ ] References: with DOIs for all those that have one (e.g. papers, datasets, software).

Functionality

Final approval (post-review)

Estimated hours spent reviewing: 9


Review Comments

The package authors have provided an excellent relational model for taxonomy. With 12k downloads and nearly 1k/month, it is already in use by the both author and quite a few others, and has already generated great value to the community. In the interest of full disclosure, I don't do much taxonomy work, so I may not be its target audience. Nonetheless, I found it pretty straightforward to use, and hope to have an opportunity to do so for my own work soon!

Checks + Tests + goodpractice::gp()

devtools::check() yields no errors, warnings, or notes, but the unit tests fail when run with devtools::test(). It appears to be an issue with the usage of file.path(path.package(..., package = 'taxlist')) calls in the unit tests. These return the file path to the source package in test, and the file path to the built package in check. In turn, I think this represents the difference between load_all() used internally in test and build used internally in check.

In every case except for test-load_last.R, you can replace the file.path(path.package(...)) with system.file('dir_within_pkg', 'file_name', package = 'taxlist') and the tests will work both interactively and in check. This approach doesn't work for the load_last test because the function takes a directory path as opposed to a complete file path, and I'm not sure how to go about getting that without some nasty gsubing, or adding another dependency like fs. One approach could be to generate a temporary directory, save an object in there, and then test load_last all within the test script itself. I'm not sure if this is best practice or not, and would request some guidance from others on that front.

I'm happy to create a pull request with the system.file() replacements if you'd like!

goodpractice::gp() yielded the following:

  ✖ write unit tests for all functions, and all package code in general. 94% of code lines are
    covered by test cases.

    R/backup_object.R:73:NA
    R/backup_object.R:75:NA
    R/backup_object.R:76:NA
    R/backup_object.R:81:NA
    R/backup_object.R:82:NA
    ... and 56 more lines

  ✖ fix this R CMD check NOTE: Note: found 137 marked UTF-8 strings

Documentation

The package contains a vignette and a README on the repository, and both provide a high level overview how one would use it. I think most potential users could get started right away. Every exported function is documented and has examples of how to use them.

I think a more extensive vignette with examples for more of the exported functions would be helpful. taxlist has a lot of exported functions, and the bulk of them are well documented and demonstrated. However, some have few details on how one might use some of them in an actual workflow. For example, the replace_*, insert_rows, and update_* functions aren't referenced at all in the vignette or Readme. The examples in the documentation are helpful to some degree, but for the first two, don't necessarily demonstrate how one would use them in a taxlist workflow.

A couple small notes:

  1. for functions with more than 1 or 2 arguments, I would also suggest naming all of the function arguments in the examples section. It's a small thing, but scrolling up and down between the Usage and Examples section to figure out what each argument corresponds to in the examples can be frustrating. It appears that this is done for the functions that are most frequently used (and referenced in the vignette), but not for others (e.g. replace_*, insert_rows).

  2. I'd suggest moving the code to access the vignette from the vignette itself to the Readme and/or the startup message (but check the rOpenSci documentation guidelines first, the latter may not best practice actually).

I would expect the vignette enhancements could take some time, so I view that as more of a long term issue than something that must be corrected for this review.

Functions

These generally work as I would expect them to. My primary critique of the code is the usage of with() in the source code for the functions and methods. This is generally recommended against in programming contexts (see that function's documentation and section 6 of Thomas Lumley's Guide to NSE). In this package, it introduces an unexpected behavior in the $ method, which I explain below. I was not able to generate other instances where the with(...) would fail silently for the other methods it appears in, but I haven't tested this exhaustively either.

I think the $ method a cool idea for quickly accessing some slots, and could help users that aren't familiar with the S4 @ usage to dive right in. On the other hand, it is somewhat confusing to more experienced users who won't expect to see a $ call on an S4 object, particularly if they're reviewing code written by someone else that makes use of this package. If you decide to keep it, then I'd recommend making its behavior more consistent and having it throw an informative error that tells the user exactly which slots it can be used on. For example:

>data(Easplist)

# In Easplist@taxonNames
>Easplist$TaxonName
Error in get(name) : object 'TaxonName' not found

# Isn't present in any Easplist slots, but is in NULLing.R
>Easplist$score
NULL

# Is present in Easplist@taxonNames and NULLing.R, 
# but not taxonTraits or taxonRelations

>Easplist$TaxonUsageID
NULL

# Is present in Easplist@taxonNames, 
# but not taxonTraits,  taxonRelations or NULLing.R

>Easplist$AuthorName
Error in get(name): object 'AuthorName' not found

It looks like the method is finding the object TaxonUsageID in the taxlist namespace which is set in NULLing.R. This is because the inherit argument in get(name) defaults to TRUE, and it isn't altered in the method's source code. Easplist$acceptedname and Easplist$score also yield NULL, while Easplist$TaxonConceptID is found in the environment generated by with, and so no further searching is performed by get. These are the only names generated in NULLing.R. However, if one created a new column in one of the tables, then tried to access it, we get the classic:


Easplist@taxonNames$mean <- runif(nrow(Easplist@taxonNames))

> Easplist$mean
object of type 'closure' is not subsettable 

I'd suggest adding something like this to the beginning of the $ method source code:


pos_nms <- c(names(x@taxonTraits),
             names(x@taxonRelations))

if(! name %in% pos_nms && name %in% c(names(other_tables))) {

  stop("can only access 'taxonTraits' or 'taxonRelations' slots with '$'")

}

Note that none of the above is a critique of using with() in examples/vignettes/Readme. Those are interactive cases, and so I don't see any major issues there, except that it may confuse some users who are not familiar with the function itself.

Finally, it would be nice to have a more standardized function naming system across the package. For example, some are named replace_*, and others named update_*, and as best I can tell, they have the same higher level purpose (i.e. modifying an object w/ new information). The rOpenSci package guidelines suggest an object_verb naming scheme (e.g. something like tl_update_concept(), tl_add_concept, tl_update_idx(), tl_insert_rows(), etc.).

Minor comments

Excellent work @kamapu and @zachary-foster! Let me know if I can help with any of the points above, or if I missed any specific points in the review guide!

maelle commented 4 years ago

Thanks a ton for your thorough review @levisc8! :rocket: :pray:

A few notes from me

kamapu commented 4 years ago

Wow! I was away of GitHub for a while (also thanks the computer in my home-office that I had to install many times). Thanks @levisc8 for the review. I assume, I can start answering and implementing suggestions, isn't it @maelle ?

maelle commented 4 years ago

It might be better to wait until the second review (by @mcsiple) is in?

:wave: @mcsiple this is a friendly reminder that your review is due on 2020-06-22 :smile_cat:

mcsiple commented 4 years ago

Hi @maelle, @kamapu, and @zachary-foster, here's my review! Please let me know if you have any questions about anything I wrote.

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

Documentation

The package includes all the following forms of documentation:

For packages co-submitting to JOSS

The package contains a paper.md matching JOSS's requirements with:

  • [ ] A short summary describing the high-level functionality of the software
  • [ ] Authors: A list of authors with their affiliations
  • [ ] A statement of need clearly stating problems the software is designed to solve and its target audience.
  • [ ] References: with DOIs for all those that have one (e.g. papers, datasets, software).

Functionality

Local install

Local install worked fine; I had some classic issues with updating {rlang} and {ps} and a couple of other packages but once I had all the dependencies there weren't any problems installing locally. (I say "classic" issues because I have often had problems when {rlang} needs to be updated. Not sure why this is.)

Install from GitHub

Install from GitHub works smoothly.

Good practices checklist

r goodpractice::gp() returns three good practices that are missing from {taxlist}:

I am not sure what exactly is happening with any of these three but it sounds like an issue with PDF creation, which has come up before (see "Checks on package source").

Metadata files

Final approval (post-review)

Estimated hours spent reviewing: 6


Key review checks

Review Comments

Thank you for the opportunity to review this package. It seems to be used very widely already, so I know it is a useful tool. Like @levisc8 , I don't do a lot of taxonomy. I tend to work with existing taxonomic classifications for things (and use taxonomy more as a unique ID to get other information like stock status), but I tried to review the package as thoroughly and usefully as I could, but I urge the package authors to interpret my comments with that lens, and not to struggle to incorporate certain changes if you know they won't improve the functionality for everyday users of the package, or if it seems that I have missed the point on that topic.

Package checks & tests

Checks

0 errors and 1 warning returned from r devtools::check() : "The warning is "'qpdf' is needed for checks on size reduction of PDFs" (probably not important, but just so you have it!)

Tests

I ran into a couple errors and warnings in this section, which I see @levisc8 has suggested some useful solutions for. I think they're just coming from how the directories are set up for the checks and the examples.

devtools::check(pkg_dir) returns 4 failures and 2 warnings.

I got 4 failures:

I got 3 warnings from devtools::check():

Documentation

Statement of need

My main question/confusing is that it was hard for me to figure out who the target user was, and how this user group overlaps (or doesn't) with users of {taxa}. I thought at first that taxlist might be a helper package for people who are building their own taxonomy packages, a replacement for {taxa}, or perhaps just provide a tool for classification in S4. The DESCRIPTION file suggests that it is specifically for users who are getting taxonomic data from Turboveg. Whatever you anticipate the dominant use for taxlist will be, that should be at the top of the README and the vignettes. I found the paper introducing the package to give a nice succinct statement of need; that one could potentially just be tweaked a little and added to the README.

If people using this package will most likely be modifying/importing existing taxonomy lists (i.e., from Turboveg), I think there should be an example or two in the vignettes that show users how this is done. This seems to be a primary functionality.

Help files

I worked through the example in the vignette. Here are a few comments:

Tiny edits I noticed as I was going along

None of these little copy edits will affect anything about the clarity of the documentation but I noticed them as I went along and thought I would document them anyway.

Session Info


R version 4.0.0 (2020-04-24)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 18363)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] devtools_2.3.0 usethis_1.6.1  magrittr_1.5  

loaded via a namespace (and not attached):
 [1] prettyunits_1.1.1 ps_1.3.3          fansi_0.4.1       rprojroot_1.3-2   withr_2.2.0      
 [6] digest_0.6.25     crayon_1.3.4      assertthat_0.2.1  R6_2.4.1          backports_1.1.7  
[11] rlang_0.4.6       cli_2.0.2         remotes_2.1.1     rstudioapi_0.11   fs_1.4.1         
[16] testthat_2.3.2    callr_3.4.3       ellipsis_0.3.1    desc_1.2.0        tools_4.0.0      
[21] glue_1.4.1        pkgload_1.1.0     processx_3.4.2    compiler_4.0.0    pkgbuild_1.0.8   
[26] sessioninfo_1.1.1 memoise_1.1.0    
```r 
maelle commented 4 years ago

@mcsiple thanks a lot for your thorough review! :sparkles:

@kamapu Now both reviews are in!

maelle commented 4 years ago

@mcsiple could you please add an estimate of the time you spent reviewing near "Estimated hours spent reviewing:"? thanks!

mcsiple commented 4 years ago

@maelle oops-- now it has been added. Thanks!

kamapu commented 4 years ago

OK, thank you to @mcsiple and @levisc8 for your valuable comments to taxlist. I guess, I should now start editing the package in order to answer to your questions. Isn't it, @maelle ?

maelle commented 4 years ago

@kamapu yes, indeed. You can continue the conversation here but at the end of your edits we expect a comment that responds to both reviews. (see the second to last point in https://devguide.ropensci.org/guide-for-authors.html)

kamapu commented 4 years ago

Thanks to the reviewers for your valuable comments and the resulting improvements. I tried my best to provide responses to all of your comments and I hope the applied modifications will be satisfactory. Note that I left unchecked boxes in the cases where my response deviates from the suggestions of the reviewers or in those cases where I didn't managed to provide a solution to an issue.

Response to @levisc8

Contribution

Community guidelines

Checks + Tests + goodpractice::gp()

Documentation

A couple small notes:

Functions

Minor comments

Response to @mcsiple

Contribution

A statement of need

Community guidelines

Good practices checklist

Automated tests

Help files

Tiny edits

Further improvements

maelle commented 4 years ago

It is ok, and asking questions is also ok, just remember to ping us in separate comments (because I'm not sure we'd get notified if you ping us from an edited comment?)

kamapu commented 4 years ago

@levisc8 I have some comments regarding your review, which I would like to clarify before editing my response:

  1. Regarding devtools::check(). Everything is running OK in my case. May this problem be related with the fact that data have a different relative path in the source as in the installed package? In any case, if your offer of solving it in the source is still open, then go ahead.

  2. In goodpractice::gp(), the message fix this R CMD check NOTE: Note: found 137 marked UTF-8 strings. I don't really know, how to solve this issue or if it is an issue at all. I already had solved similar problems reported by CRAN in https://github.com/kamapu/taxlist/commit/3b2269ddb05ec01baebb7afec566b2c1d47a8665

  3. In the case of data-raw/0_check.R, this is the script I use to automatically build and check taxlist before committing changes to GitHub (honestly, I'm sort of proud about it). I know, it may be confusing for contributors but I have to keep it in the source. Is there a better place and/or a better name that you may suggest for this script?

kamapu commented 4 years ago

I forgot to mention it, @levisc8 : I work with eclipse + StatET (not RStudio), thus I need the data-raw/0_check.R to carry out builds and tests.

kamapu commented 4 years ago

Dear @levisc8 In your review, under Functionality, there are two unchecked boxes but without comments:

  • [ ] Automated tests: Unit tests cover essential functions of the package and a reasonable range of inputs and conditions. All tests pass on the local machine.
  • [ ] Packaging guidelines: The package conforms to the rOpenSci packaging guidelines.

I'm wondering if you may have forgotten to check them (in that case, I just ignore them) or if there are issues to be solved on that regard.

kamapu commented 4 years ago

... this questions is for all reviewers + editor: @levisc8 @mcsiple @maelle

goodpractice::gp() will complain that:

  1. The coverage is not 100%.
  2. Issues with UTF-8 strings, although the encoding is declared in the description and there is no NOTE on this regard when using R CMD check in a Latin-1 environment, as required by CRAN.

Am I allowed to ignore these issues in the review? I mean, I can redact a brief response but no changes in code will be required.

kamapu commented 4 years ago

@mcsiple : In your outputs of goodpractice::gp(), the LaTeX error can be related to your local LaTeX distribution or your installed version of rmarkdown. Perhaps it can be solved by updating your LaTeX distribution and/or rmarkdown. The later from the GitHub repository???

levisc8 commented 4 years ago

Hi @kamapu , Thanks for getting to this so quickly! I'll respond in order:

  1. Re devtools::check(): I'll try to get a pull request together by the middle of next week. The issue I had was not with check itself, but running devtools::test from a command prompt. The tests do not work interactively, and the line highlighted in @maelle 's comment here is exactly the fix I was suggesting. I believe that altering the tests that use that (with the exception of the one I mentioned in my review) will resolve that issue.

    • I'm not sure how huge of an issue this is since I don't think very many people run devtools::test() on other people's source packages frequently (if ever). I did think it was worth pointing out though, as the tests do not technically pass on my local machine (which is why I didn't check the box in my review).
  2. It appears to come from some of the example data sets - you could try something like this to set it right. I think that check gets fussy if there are any non-ASCII strings in a package, but my understanding of locales and encodings is very limited, so I don't know why it would get fussy about this.

    • As for the test coverage - I don't know what rOpenSci's requirements are, but I don't expect 100% coverage. 94% is pretty darn good :)
  3. It is a cool script! I don't mean to say "get rid of the script". You don't need to move it either, as the data-raw directory is in your .rbuildignore, and so will never find its way into a built package. I just meant that I don't think you need those packages in the Suggests field of your DESCRIPTION file. The reason for this is that some users may want to install all dependencies, but not want to install unneeded packages for taxlist's functionality (particularly stringi and devtools, which can take some time to install from source). Sorry for the confusion! Does that make sense?

    • I have no problems w/ checking if I remove devtools. stringi, and goodpractice from the Suggests field, so I don't think it will cause any issues w/ CRAN or other OS's. It just makes taxlist a bit leaner is all.
  4. Packaging guidelines - I think I left that un-checked because of the website, but that is clarified now so I've marked it complete :)

kamapu commented 4 years ago

@levisc8 I was just reading your comments again:

1. Regarding `devtools::check()`. Everything is running OK in my case. May this problem be related with the fact that data have a different relative path in the source as in the installed package? In any case, if your offer of solving it in the source is still open, then go ahead.

I guess, I'm able to do the proposed change and test it (using devtools::test()).

kamapu commented 4 years ago

@levisc8 Just a comment on your comment:

  1. It appears to come from some of the example data sets - you could try something like this to set it right...

The encoding issues have been the biggest nightmare not only when programming but also in real life (see here). I pretend to assume, that the issue was already solved before and goodpractice::gp() is missing some settings to tolerate UTF-8 encodings.

kamapu commented 4 years ago

@levisc8

you can replace the file.path(path.package(...)) with system.file('dir_within_pkg', 'file_name', package = 'taxlist')

This solves the problems in devtools::test() but causes errors in devtools::check_built()

maelle commented 4 years ago

Hello @kamapu

kamapu commented 4 years ago

Hello @maelle

  1. All right.

  2. Unfortunately I got the same output at r-hub:

  • checking data for non-ASCII characters ... NOTE Note: found 137 marked UTF-8 strings
  1. A bit of both: I'm been working for long time with eclipse. The biggest advantages over RStudio are as well the representation of objects in "Object browser" and the possibility of work with a broad spectrum of applications (R, Web Development, LaTeX, PostgreSQL, etc.) in just one workbench. Otherwise I use RStudio to introduce R in our courses.
maelle commented 4 years ago

It's unfortunate but means the problem needs to be fixed (or well justified), because you'd get it on CRAN check platform that has this encoding setting. Would you mind opening a question about that at https://discuss.ropensci.org/?

kamapu commented 4 years ago

Hello all: @maelle @levisc8 @mcsiple

I see, I have to dedicate some time with the example data, again. I though, I have solved it in the last release at CRAN but I just realized, the test is still providing the NOTE about UTF-8 Strings.

I assume, R CMD check is not giving any warning because of my local settings (I'm working in Linux).

Perhaps some of you could give me some hints:

  1. The data is originally stored in PostgreSQL, also using UTF-8 encoding. I export the data as csv-file in data-raw/Easplist, also using UTF-8. And then I create the taxlist object in an R-Session. Perhaps something can be adjusted in this workflow to get rid of the warnings.

  2. I was assuming that using Sys.setenv(LANG="en_US.iso88591") I was testing the package in the CRAN settings but it is evidently not working.

kamapu commented 4 years ago

I always advice our students to not confuse ERRORS with WARNINGS. And now I see me confusing WARNINGS with NOTES. Are the later usually tolerated?

maelle commented 4 years ago

they could but they need to be justified, and their mere existence delays the CRAN review process, that's why it's better to get rid of them.

kamapu commented 4 years ago

@maelle I see your point, although I don't really know why UTF-8 is a problem. Anyway, the issue is that the vector including author names is prone to have "special characters", actually even taxon usage names (for instance the genus Isöetes and its species). In taxlist the combination between taxon usage names and authority (columns TaxonName and AuthorName in slot taxonNames) is a sort of identity for a name (no duplicated allowed).

I'm willing to solve the problem but I will need support on this regard, especially because I wish, the package can still be of any use disregarding the local settings of the users. I also wish to reach THE solution, because the issue is emerging again, and again, ... :sob: :sob: :sob:

maelle commented 4 years ago

To clarify, I am not saying it's a bad NOTE but I think it'd be easier if you opened a thread at https://discuss.ropensci.org where more community members could chime in with ideas for making the NOTE disappear or writing a good comment for CRAN (who'd be the ones to be convinced the NOTE is fine), from their own experience.

I do understand encoding pain, my name is Ma?#$lle I mean Maëlle. :wink:

kamapu commented 4 years ago

@maelle Thanks for clarification. I sort of lost the path (cause I'm doing many things in parallel). I'll do as suggested but not right now...

Perhaps it was a premonition but long time ago I decided to skip the accent from Álvarez when self-referencing...

kamapu commented 4 years ago

Dear @maelle @mcsiple @levisc8

I just finished the response to your reviews. Note that there are few discrepancies and unsolved issues. I hope, the improvements will be satisfactory for this submission. Thank you again!

The edited review is here

maelle commented 4 years ago

Thanks @kamapu, by unsolved issues you don't mean something where a bit more time would help?

@mcsiple @levisc8 are you ok with the response?

kamapu commented 4 years ago

@maelle "unsolved issues" actually mean, it is not depending on me anymore...

levisc8 commented 4 years ago

Before I get to this again, just want to clarify - do the unresolved issues refer to the UTF-8 strings in the example data set? Or something else?

kamapu commented 4 years ago

In general I consider as unresolved: 1) UTF-8 Note by goodpractice::pg() and rhub::check(platform="debian-clang-devel"). 2) I think, @mcsiple got several error or warning messages, in part related to LaTeX, which I'm supposing to be relate with her local settings but it should still be clarified.

Of course, I can still receive and try suggestions.

KevCaz commented 4 years ago

Hi @kamapu, As I was interested in your issue, I had a look at it. Looks like the UTF-8 note you got comes from your dataset (Easplist.dra) (see https://github.com/dankelley/oce/issues/1211)

R> tools:::.check_package_datasets(".")                         
Note: found 137 marked UTF-8 strings

Something that is not checked by default by devtools::check() (I'm guessing here) but is checked if you do the combo R CMD build/R CMD check.

Hope this could be useful.

kamapu commented 4 years ago

Dear all, sorry for annoying you with the issue of encodings. While I am sort of developing a pathological aversion for this aspect of programming, for people dealing with taxonomy (including myself), the use of names and their authorities is quite crucial in the management of information, thus considered in taxlist as a component of the identity of a taxon concept (the accepted name of the taxon) and the taxon usage id (alternative names used to refer to a taxon), and this is the matter of my headaches: global context = special characters...

The possible solutions explored until now, just to get rid of a note, are not satisfying me in a 100%.

Now I have a request for you: The next is a block of code including the outputs in my console (I'm working in Linux with UTF-8 locale). My question is: Do you get the exact output as in my case?

library(stringi)
library(taxlist)

Names <- Easplist@taxonNames$AuthorName[c(5299, 5021, 5019)]
Names
#> [1] "Borsch, Kai Müll. & Eb.Fisch." "Ruiz & Pav."                  
#> [3] "Ség."

stri_enc_mark(Names)
#> [1] "UTF-8" "ASCII" "UTF-8"

iconv(Names, "utf8", "ascii")
#> [1] NA            "Ruiz & Pav." NA           

stri_enc_toascii(Names)
#> [1] "Borsch, Kai M\032ll. & Eb.Fisch." "Ruiz & Pav."                     
#> [3] "S\032g."                         

stri_trans_general(Names, "latin-ascii")
#> [1] "Borsch, Kai Mull. & Eb.Fisch." "Ruiz & Pav."                  
#> [3] "Seg."   

See also this discussion.

maelle commented 4 years ago

For reference link to Discourse thread https://discuss.ropensci.org/t/note-on-utf-8-strings-by-goodpractice-gp/2165

levisc8 commented 3 years ago

@kamapu regarding your last question, I do get the exact same output as you do. Furthermore, stri_trans_general(Names, "latin-ascii") %>% str_enc_mark() returns 'ascii' 'ascii' 'ascii'.

levisc8 commented 3 years ago

Review round 2:

I'm generally fine with the new form of the package. I just have some minor comments at this point.

Unresolved issues

  1. UTF-8 strings: I don't have strong feelings about this, as it is pretty far beyond my area of expertise. If you can fix it with the code blurb from above, excellent!

  2. The pdflatex errors could be because pdflatex is not in the PATH variable (see here). I don't think there's much you can do about this from your side though.

    • I think the qpdf warning is similar. Basically, just need to install qpdf and add it to the PATH variable to check smoothly.

Rest of the response

  1. Documentation is looking good.

  2. package API: I understand the point regarding backward compatibility. I also don't think it's unreasonable to make breaking changes if they result in substantial improvements to the user experience. Incrementing by a major version number (e.g. 0.2 -> 1.0) should signal to most users that they should expect major changes. If users really don't want to change their code, they can install archived versions, either from Github or from CRAN. This is something I'd defer to rOpenSci editors on though, as I don't really believe it is a deal breaker or anything.

    • Re: aggregate/count_taxa: good point - I hadn't thought about that! On the other hand, there is still an rm.na argument for a couple methods. This could change to na.rm for consistency w/ the base/stats packages. Personally, I would forget that it is rm.na every single time I tried to use the package. Again, I appreciate the point about back-compatibility, but this doesn't need to be a breaking change right away - you can retain the old form and deprecate it for the next few minor versions as well.
mcsiple commented 3 years ago

Hi @kamapu , @maelle et al.,

My apologies for the delay in responding to these new revisions and response. I have itemized them below mostly for my own mental organization (they match the structure of @kamapu 's response, I can keep track of what I have checked).

Note/caveat: I normally run all the checks on two different operating systems to provide better coverage in my ran everything on my other computer. This time I did my entire initial review on a Windows OS and am now checking everything again on my Mac. This could be a "good" thing because I've covered more ground, but IMO it's also suboptimal because some of the changes I've seen could potentially be caused by different software/packages instead of changes to the package. I think this is mostly the case with the LaTeX issues, but wanted to include the disclaimer.

Statement of need

This edit is great and satisfies my question(s) about the context and uses for the package.

Community guidelines

Newly added contributing guidelines look good.

Good practices checklist

UTF-8 string issues

It looks like this one is still being hashed out to some extent... I hope you find a good resolution for it. It sounds from the ropensci.org post like maybe this will remain an issue for people with a setup that doesn't have a font that renders the non-ascii characters. Maybe a note in the vignette about whether/how this would affect functionality would be good, or better yet a custom check or error message upon install... I am sure other package developers will have this issue so you're doing a service to the community by trying to figure it out!

LaTeX issue

This one is my fault; I normally run all the checks on two different operating systems to provide better coverage in my ran everything on my other computer. No LaTeX errors in goodpractice::gp() this time, so consider this part resolved.

Automated tests

devtools::check() : I found no errors with R CMD check this time.

Help files

Thanks - let me know if you have questions/comments about any of the above.

maelle commented 3 years ago

:wave: @kamapu, would you soon be able to respond the reviewers' new comments? Thank you.

kamapu commented 3 years ago

Sorry for the break (many duties and a new family member arrived). I'm ready to go!

kamapu commented 3 years ago

Response to Reviews in Round 2

Dear @maelle @mcsiple @levisc8 I'm sorry that despite the progress on the review I had to stay out of programming for a while. Now I'm back and like to thank for all valuable comments again.

I will than refer to the comments of reviewers at round 2.

Response to @levisc8

Unresolved Issues

Rest

Response to @mcsiple

Two comments to this review, otherwise I agree with the rest.

maelle commented 3 years ago

Thanks @kamapu and congrats again on the new family member!

@mcsiple @levisc8 are you ok with the response?