Closed kamapu closed 3 years ago
Ok, I'll wait until then! You can ask questions on rOpenSci forum if needed. Good luck! (and yeah encoding is hard!)
The issue with encoding was solved in the last commit, @maelle Now we can proceed.
Thanks!
Is https://github.com/kamapu/taxlist/blob/master/TODO.md out-of-date? If so could you please remove it to not distract reviewers?
I'm now looking for reviewers.
Please add a peer-review badge to the package README: [![](https://badges.ropensci.org/233_status.svg)](https://github.com/ropensci/software-review/issues/233)
Reviewers assigned! Thanks a lot @mcsiple & @levisc8 for agreeing to review. :smile_cat: Your reviews are due on 2020-06-22.
As a reminder here are links to the reviewer guide and review template.
Hi @maelle, @kamapu, and @zachary-foster, hope you're all doing well! Thanks for this excellent package and the opportunity to review it! This is my first rOpenSci review, so please let me know if I missed anything.
Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide
The package includes all the following forms of documentation:
URL
, BugReports
and Maintainer
(which may be autogenerated via Authors@R
).For packages co-submitting to JOSS
- [ ] The package has an obvious research application according to JOSS's definition
The package contains a
paper.md
matching JOSS's requirements with:
- [ ] A short summary describing the high-level functionality of the software
- [ ] Authors: A list of authors with their affiliations
- [ ] A statement of need clearly stating problems the software is designed to solve and its target audience.
- [ ] References: with DOIs for all those that have one (e.g. papers, datasets, software).
Estimated hours spent reviewing: 9
The package authors have provided an excellent relational model for taxonomy. With 12k downloads and nearly 1k/month, it is already in use by the both author and quite a few others, and has already generated great value to the community. In the interest of full disclosure, I don't do much taxonomy work, so I may not be its target audience. Nonetheless, I found it pretty straightforward to use, and hope to have an opportunity to do so for my own work soon!
devtools::check()
yields no errors, warnings, or notes, but the unit tests fail when run with devtools::test()
. It appears to be an issue with the usage of file.path(path.package(..., package = 'taxlist'))
calls in the unit tests. These return the file path to the source package in test
, and the file path to the built package in check
. In turn, I think this represents the difference between load_all()
used internally in test
and build
used internally in check
.
In every case except for test-load_last.R
, you can replace the file.path(path.package(...))
with system.file('dir_within_pkg', 'file_name', package = 'taxlist')
and the tests will work both interactively and in check
. This approach doesn't work for the load_last
test because the function takes a directory path as opposed to a complete file path, and I'm not sure how to go about getting that without some nasty gsub
ing, or adding another dependency like fs
. One approach could be to generate a temporary directory, save an object in there, and then test load_last
all within the test script itself. I'm not sure if this is best practice or not, and would request some guidance from others on that front.
I'm happy to create a pull request with the system.file()
replacements if you'd like!
goodpractice::gp()
yielded the following:
✖ write unit tests for all functions, and all package code in general. 94% of code lines are
covered by test cases.
R/backup_object.R:73:NA
R/backup_object.R:75:NA
R/backup_object.R:76:NA
R/backup_object.R:81:NA
R/backup_object.R:82:NA
... and 56 more lines
✖ fix this R CMD check NOTE: Note: found 137 marked UTF-8 strings
The package contains a vignette and a README on the repository, and both provide a high level overview how one would use it. I think most potential users could get started right away. Every exported function is documented and has examples of how to use them.
I think a more extensive vignette with examples for more of the exported functions would be helpful. taxlist
has a lot of exported functions, and the bulk of them are well documented and demonstrated. However, some have few details on how one might use some of them in an actual workflow. For example, the replace_*
, insert_rows
, and update_*
functions aren't referenced at all in the vignette or Readme. The examples in the documentation are helpful to some degree, but for the first two, don't necessarily demonstrate how one would use them in a taxlist
workflow.
A couple small notes:
for functions with more than 1 or 2 arguments, I would also suggest naming all of the function arguments in the examples section. It's a small thing, but scrolling up and down between the Usage
and Examples
section to figure out what each argument corresponds to in the examples can be frustrating. It appears that this is done for the functions that are most frequently used (and referenced in the vignette), but not for others (e.g. replace_*
, insert_rows
).
I'd suggest moving the code to access the vignette from the vignette itself to the Readme and/or the startup message (but check the rOpenSci documentation guidelines first, the latter may not best practice actually).
I would expect the vignette enhancements could take some time, so I view that as more of a long term issue than something that must be corrected for this review.
These generally work as I would expect them to. My primary critique of the code is the usage of with()
in the source code for the functions and methods. This is generally recommended against in programming contexts (see that function's documentation and section 6 of Thomas Lumley's Guide to NSE). In this package, it introduces an unexpected behavior in the $
method, which I explain below. I was not able to generate other instances where the with(...)
would fail silently for the other methods it appears in, but I haven't tested this exhaustively either.
I think the $
method a cool idea for quickly accessing some slots, and could help users that aren't familiar with the S4 @
usage to dive right in. On the other hand, it is somewhat confusing to more experienced users who won't expect to see a $
call on an S4 object, particularly if they're reviewing code written by someone else that makes use of this package. If you decide to keep it, then I'd recommend making its behavior more consistent and having it throw an informative error that tells the user exactly which slots it can be used on. For example:
>data(Easplist)
# In Easplist@taxonNames
>Easplist$TaxonName
Error in get(name) : object 'TaxonName' not found
# Isn't present in any Easplist slots, but is in NULLing.R
>Easplist$score
NULL
# Is present in Easplist@taxonNames and NULLing.R,
# but not taxonTraits or taxonRelations
>Easplist$TaxonUsageID
NULL
# Is present in Easplist@taxonNames,
# but not taxonTraits, taxonRelations or NULLing.R
>Easplist$AuthorName
Error in get(name): object 'AuthorName' not found
It looks like the method is finding the object TaxonUsageID
in the taxlist
namespace which is set in NULLing.R
. This is because the inherit
argument in get(name)
defaults to TRUE
, and it isn't altered in the method's source code. Easplist$acceptedname
and Easplist$score
also yield NULL
, while Easplist$TaxonConceptID
is found in the environment generated by with
, and so no further searching is performed by get
. These are the only names generated in NULLing.R
. However, if one created a new column in one of the tables, then tried to access it, we get the classic:
Easplist@taxonNames$mean <- runif(nrow(Easplist@taxonNames))
> Easplist$mean
object of type 'closure' is not subsettable
I'd suggest adding something like this to the beginning of the $
method source code:
pos_nms <- c(names(x@taxonTraits),
names(x@taxonRelations))
if(! name %in% pos_nms && name %in% c(names(other_tables))) {
stop("can only access 'taxonTraits' or 'taxonRelations' slots with '$'")
}
$
indicates that $<-
method is implemented, but I can't find the source code for it and I get the following error when I try to use: "no method for assigning subsets of this S4 class
". Does this mean it is not yet implemented?Note that none of the above is a critique of using with()
in examples/vignettes/Readme. Those are interactive cases, and so I don't see any major issues there, except that it may confuse some users who are not familiar with the function itself.
Finally, it would be nice to have a more standardized function naming system across the package. For example, some are named replace_*
, and others named update_*
, and as best I can tell, they have the same higher level purpose (i.e. modifying an object w/ new information). The rOpenSci package guidelines suggest an object_verb
naming scheme (e.g. something like tl_update_concept()
, tl_add_concept
, tl_update_idx()
, tl_insert_rows()
, etc.).
rOpenSci guidelines suggest not including "This is package_vX.Y.Z" in the startup message.
More of a wish, but a print.taxlist
or show.taxlist
method would be nice, even if it's identical to the current summary
method.
Could the rm.na
/include_na
arguments in count_taxa
switch to na.rm
for consistency w/ other base
/stats
functions?
Couldn't find contributing/community guidelines in the repository.
I hope this isn't annoying, but I would suggest using a code format that is more consistent with the norms in the R community (e.g. Google R style guide, or the tidyverse style guide). This will enable other contributors to more easily read and understand the source code, and reduce the difficulty in making contributions. Specifically, I'd suggest
spacing after all commas, and after opening [
when the first argument is empty (i.e. subsetting columns).
single statements on a single line except where it is impossible to keep them under 80 characters wide.
indenting that matches opening brackets/parentheses in function calls.
Description lists the following packages in Suggests, but I don't see them used in the vignettes, Readme, or Code:
stringi
goodpractice
It looks like these were included in the data-raw/0_check.R
file. I don't think they need to be included in the package though, as they are already .Rbuildignore
'd since they're in data-raw/
and so shouldn't cause any problems with check
and/or CRAN. I had no issues when re-running check
with them removed.
Consider making a package website. My experience w/ using pkgdown
for this purpose has been excellent.
Excellent work @kamapu and @zachary-foster! Let me know if I can help with any of the points above, or if I missed any specific points in the review guide!
Thanks a ton for your thorough review @levisc8! :rocket: :pray:
A few notes from me
No need to set up a pkgdown website now since there'll be one built for the package once it is approved. Unless @kamapu wants feedback on the navbar and reference index grouping for instance. We're however thinking about this, I'm not saying it's a bad comment, on the contrary. :wink:
Regarding the paths in tests, in case it is relevant (sorry if I'm missing subtleties). In rtimicropem I put such data in inst/exdata and access it from tests.
Just adding the link to the dev guide section about CONTRIBUTING.md
Wow! I was away of GitHub for a while (also thanks the computer in my home-office that I had to install many times). Thanks @levisc8 for the review. I assume, I can start answering and implementing suggestions, isn't it @maelle ?
It might be better to wait until the second review (by @mcsiple) is in?
:wave: @mcsiple this is a friendly reminder that your review is due on 2020-06-22 :smile_cat:
Hi @maelle, @kamapu, and @zachary-foster, here's my review! Please let me know if you have any questions about anything I wrote.
Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide
The package includes all the following forms of documentation:
URL
, BugReports
and Maintainer
(which may be autogenerated via Authors@R
).
I did not find any Contributing Guidelines in the README or DESCRIPTIONFor packages co-submitting to JOSS
- [ ] The package has an obvious research application according to JOSS's definition
The package contains a
paper.md
matching JOSS's requirements with:
- [ ] A short summary describing the high-level functionality of the software
- [ ] Authors: A list of authors with their affiliations
- [ ] A statement of need clearly stating problems the software is designed to solve and its target audience.
- [ ] References: with DOIs for all those that have one (e.g. papers, datasets, software).
Local install worked fine; I had some classic issues with updating {rlang} and {ps} and a couple of other packages but once I had all the dependencies there weren't any problems installing locally. (I say "classic" issues because I have often had problems when {rlang} needs to be updated. Not sure why this is.)
Install from GitHub works smoothly.
r goodpractice::gp()
returns three good practices that are missing from {taxlist}:
tools::texi2pdf()
I am not sure what exactly is happening with any of these three but it sounds like an issue with PDF creation, which has come up before (see "Checks on package source").
[X] Performance: Any performance claims of the software been confirmed.
[ ] Automated tests: Unit tests cover essential functions of the package and a reasonable range of inputs and conditions. All tests pass on the local machine. Some of the tests failed when I ran test(pkg_dir) (See "Package tests & checks" section)
[X] Packaging guidelines: The package conforms to the rOpenSci packaging guidelines
Estimated hours spent reviewing: 6
Thank you for the opportunity to review this package. It seems to be used very widely already, so I know it is a useful tool. Like @levisc8 , I don't do a lot of taxonomy. I tend to work with existing taxonomic classifications for things (and use taxonomy more as a unique ID to get other information like stock status), but I tried to review the package as thoroughly and usefully as I could, but I urge the package authors to interpret my comments with that lens, and not to struggle to incorporate certain changes if you know they won't improve the functionality for everyday users of the package, or if it seems that I have missed the point on that topic.
0 errors and 1 warning returned from r devtools::check()
: "The warning is "'qpdf' is needed for checks on size reduction of PDFs
" (probably not important, but just so you have it!)
I ran into a couple errors and warnings in this section, which I see @levisc8 has suggested some useful solutions for. I think they're just coming from how the directories are set up for the checks and the examples.
devtools::check(pkg_dir)
returns 4 failures and 2 warnings.
I got 4 failures:
I got 3 warnings from devtools::check()
:
test()
is looking in taxlist/cyperus/ for names.csv instead of taxlist/inst/cyperus I think this directory is on line 3 of test-df2.taxlist.R. Like the R:4 error above, inst is missing from the directory. My main question/confusing is that it was hard for me to figure out who the target user was, and how this user group overlaps (or doesn't) with users of {taxa}. I thought at first that taxlist might be a helper package for people who are building their own taxonomy packages, a replacement for {taxa}, or perhaps just provide a tool for classification in S4. The DESCRIPTION file suggests that it is specifically for users who are getting taxonomic data from Turboveg. Whatever you anticipate the dominant use for taxlist will be, that should be at the top of the README and the vignettes. I found the paper introducing the package to give a nice succinct statement of need; that one could potentially just be tweaked a little and added to the README.
If people using this package will most likely be modifying/importing existing taxonomy lists (i.e., from Turboveg), I think there should be an example or two in the vignettes that show users how this is done. This seems to be a primary functionality.
I worked through the example in the vignette. Here are a few comments:
Cross[,2:8]
not incorporated anywhere in the result of Cross[,"TaxonName"]
? If so, could someone theoretically use taxlist to create a brand new taxonomy with its own organizational structure? I think this is the case but it would be good to make it more obvious.None of these little copy edits will affect anything about the clarity of the documentation but I noticed them as I went along and thought I would document them anyway.
taxon_relations()
is misspelled in the help file for taxon_relations (under 'Details')subset()
has been applied'
R version 4.0.0 (2020-04-24)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 18363)
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] devtools_2.3.0 usethis_1.6.1 magrittr_1.5
loaded via a namespace (and not attached):
[1] prettyunits_1.1.1 ps_1.3.3 fansi_0.4.1 rprojroot_1.3-2 withr_2.2.0
[6] digest_0.6.25 crayon_1.3.4 assertthat_0.2.1 R6_2.4.1 backports_1.1.7
[11] rlang_0.4.6 cli_2.0.2 remotes_2.1.1 rstudioapi_0.11 fs_1.4.1
[16] testthat_2.3.2 callr_3.4.3 ellipsis_0.3.1 desc_1.2.0 tools_4.0.0
[21] glue_1.4.1 pkgload_1.1.0 processx_3.4.2 compiler_4.0.0 pkgbuild_1.0.8
[26] sessioninfo_1.1.1 memoise_1.1.0
```r
@mcsiple thanks a lot for your thorough review! :sparkles:
@kamapu Now both reviews are in!
@mcsiple could you please add an estimate of the time you spent reviewing near "Estimated hours spent reviewing:"? thanks!
@maelle oops-- now it has been added. Thanks!
OK, thank you to @mcsiple and @levisc8 for your valuable comments to taxlist
. I guess, I should now start editing the package in order to answer to your questions. Isn't it, @maelle ?
@kamapu yes, indeed. You can continue the conversation here but at the end of your edits we expect a comment that responds to both reviews. (see the second to last point in https://devguide.ropensci.org/guide-for-authors.html)
Thanks to the reviewers for your valuable comments and the resulting improvements. I tried my best to provide responses to all of your comments and I hope the applied modifications will be satisfactory. Note that I left unchecked boxes in the cases where my response deviates from the suggestions of the reviewers or in those cases where I didn't managed to provide a solution to an issue.
URL
, BugReports
and Maintainer
(which may be autogenerated via Authors@R
).
Response: Added in https://github.com/kamapu/taxlist/commit/b1b84bcf6b227fbefe51b194515b96c806dc70e0[x] Errors in devtools::test()
Response: This issue is solved by @levisc8 in https://github.com/kamapu/taxlist/commit/869e83e100e2913c8951584bdaa4735cb9f88b41
[ ] goodpractice::gp() issues with UTF-8 strings Response: I don't have yet a final response about this issue. I already started discussions at https://github.com/MangoTheCat/goodpractice/issues/141 and https://discuss.ropensci.org/t/note-on-utf-8-strings-by-goodpractice-gp/2165.
[x] Improving documentation on auxiliary functions
Response: The functions replace_*()
and insert_rows()
were originally designed as internal, auxiliary functions in order to make coding a bit more efficient and comprehensive in the case of data manipulation. On this way, we realized that these functions are useful when manipulating data related to functional traits and record properties in the package vegtable
. While the use of them is too specific to include them in the vignette, I tried to improve the help documentation in https://github.com/kamapu/taxlist/commit/f7341af008ad30d258f1f726040f8935650ad608.
[x] Improving documentation on update_*()
functions
Response: While these functions are not handled in the vignette, which is at the moment a quick start in the use of the package, I inserted missing examples in the help files in https://github.com/kamapu/taxlist/commit/e6e8c8dd1ff6a9ce65665f9cb4f38d03dd3960dc.
[x] Naming arguments in examples Response: Added in https://github.com/kamapu/taxlist/commit/da1a5661f960b0ea70699f52393df479ed4ddc39 as suggested by the reviewer. All arguments are named when functions are using settings for more than 2 arguments in the examples.
[x] Command calling vignette Response: I prefer to keep the command line in the vignette itself, since a copy of it may be accessible from a package homepage (see https://kamapu.github.io/rpkg/taxlist/index.html). Nevertheless, I also included the command in README at https://github.com/kamapu/taxlist/commit/896b90abe89cea3d456942eb7fc553352fa6880e
[x] Diverse vignette enhancements Response: I will attempt to implement as much as possible from the comments done to the vignette.
[x] Use of with()
Response: The use of with()
was purged from code and vignette in https://github.com/kamapu/taxlist/commit/295ed9507271555437dcd3be9e1d82eeaba04555
[x] Behaviour of $
Response: This method was defined for a quick access to functional traits in taxlist
objects and simplify commands using such information for statistics. The suggested error message when value of argument name
is not a column of slot taxonTraits
or taxonRelations
is implemented in https://github.com/kamapu/taxlist/commit/d35e7e07ae5473ecbfe5a8e621029ae31ff98cf1. Additionally the documentation of the function have been improved and a new validity rule is preventing homonymous columns in slots taxonRelations
and taxonTraits
(except for TaxonConceptID).
[ ] Functions naming: It would be nice to have a more standardized function naming system across the package. For example, some are named replace_*
, and others named update_*
, and as best I can tell, they have the same higher level purpose (i.e. modifying an object w/ new information). The rOpenSci package guidelines suggest an object_verb
naming scheme (e.g. something like tl_update_concept()
, tl_add_concept
, tl_update_idx()
, tl_insert_rows()
, etc.).
Response: I agree with all points mentioned on this regard, unfortunately some guidelines were not considered during the development of taxlist
, at least I avoided capital letters when naming functions. Since the package is already uploaded in CRAN and in use, I would like to avoid such a drastic change in the package, which will surely imply deprecation of a lot of functions.
[x] Skip start message Response: Start message is deleted in https://github.com/kamapu/taxlist/commit/a8b2c6a979e7fd02a8e8b62b920109fd96f72c83
[x] Show and print methods
Response: Show and print methods were defined in https://github.com/kamapu/taxlist/commit/af5ed50bbae93576dd9db43e9a18814fd4bbe0f5 and are identical to summary(taxlist-obj)
.
[ ] argument rm.na
instead of include_na
(see function count_taxa()
).
Response: I would like to preserve this argument as it is, since I guess that it can be only handled by adding a redundant argument in the function in order to keep back-compatibility in the package. On the other hand, I disagree @levisc8 in part: Usually the argument rm.na
removes NAs to avoid error messages in statistical functions. The function count_taxa()
is based on aggregate()
, which has the argument na.action
. This argument can be set as na.omit
or na.fail
. Thus none of them have the option to include NA
values in the result.
[x] I couldn't find contributing/community guidelines in the repository. Response: Added in https://github.com/kamapu/taxlist/commit/b1b84bcf6b227fbefe51b194515b96c806dc70e0
[x] Coding style Response: The suggested changes may be hard to screen and replace, thus they will be implemented partially when updating the package. At least spaces between square brackets are inserted in https://github.com/kamapu/taxlist/commit/807fe5ed87a198d3973b0e8546bab0154a907fc4
[x] Packages stringi
and goodpractice
in Suggests, DESCRIPTION
Response: Those packages and other were removed in https://github.com/kamapu/taxlist/commit/158a590c46c00bd431e4fdc96b2394c7e8acb14f
[x] Package website Response: A package site is published at https://kamapu.github.io/rpkg/taxlist/index.html
taxlist
and taxa
.[x] goodpractice::gp()
issues with UTF-8 strings
Response: The same as in review by @levisc8 . I already started discussions at https://github.com/MangoTheCat/goodpractice/issues/141 and https://discuss.ropensci.org/t/note-on-utf-8-strings-by-goodpractice-gp/2165.
[ ] goodpractice::gp()
issues with LaTeX
Response: I assume that issues with LaTeX may be depending on the local installed distribution or the installed version of rmarkdown
or knitr
.
devtools::check()
Response: While this was the specific case of @mcsiple review, I was assuming that most of them may be due to local settings and probably outdated packages. The same could be applied to previous LaTeX conflicts. Probable, some of the issues may have been solved in https://github.com/kamapu/taxlist/commit/869e83e100e2913c8951584bdaa4735cb9f88b41 by @levisc8[x] Vignette and creation of objects: It would be useful to have some information about how to create a taxlist that contains more than just species names. For example, if Cross is a dataframe with a "Genus" or a "Family" column (or both!), how might one build a taxlist from it? How can a user deal with missing values or mixed info (e.g., a Genus name accidentally included in the wrong column)? This may be especially relevant for old taxon lists like the one shown in the example.
Response: To date I will claim that everything is possible with taxlist
:wink: In fact I inserted in the vignette a last example applying this kind of objects to syntaxonomic lists, since I'm also working with phytosociological classifications. By the way, also zoological lists may be feasible to be implemented in taxlist
, although this is not of my expertise and therefore I don't offer any examples at the moment (perhaps a possible contribution from your side?). Both, the syntaxonomic example in vignettes and the ferns example in README build a taxlist
object step-by-step. I added a chapter for a direct conversion from a data frame to a taxlist
object in https://github.com/kamapu/taxlist/commit/70c9109f279a4c368ad0d2cc9f7de783d9b9abb2. Note that some modifications were required for function df2taxlist()
.
[x] The file Cross? Is the information in Cross[,2:8]
not incorporated anywhere in the result of Cross[,"TaxonName"]
? If so, could someone theoretically use taxlist to create a brand new taxonomy with its own organizational structure? I think this is the case but it would be good to make it more obvious.
Response: This is answered in the previous comment.
[x] The Codes
Response: The codes are the identifier of the records in one of the three main slots, taxonName (TaxonUsageID),
taxonRelations (TaxonConceptID), and taxonViews (ViewID). All of them are only allowed to be integers in taxlist
. These IDs can be already provided at import or in a subset, or generated automatically at import or when adding new elements to an existing object.
[x] Typos and improvements in vignette Response: Suggested corrections were accepted in https://github.com/kamapu/taxlist/commit/a6396a4e5d5baa81792a4572fe4fe7afb105fb03
[x] Misspelled function taxon_relations<-
Response: Solved in https://github.com/kamapu/taxlist/commit/2d518e606caa987a32ec8fde72c1d59bfb237616
taxlist
of masking base::levels()
in https://github.com/kamapu/taxlist/commit/b7379242ad516eefb481686df25d861d73ef9ac5. The taxlist
method have been adapted to the original definition of the function.It is ok, and asking questions is also ok, just remember to ping us in separate comments (because I'm not sure we'd get notified if you ping us from an edited comment?)
@levisc8 I have some comments regarding your review, which I would like to clarify before editing my response:
Regarding devtools::check()
. Everything is running OK in my case. May this problem be related with the fact that data have a different relative path in the source as in the installed package? In any case, if your offer of solving it in the source is still open, then go ahead.
In goodpractice::gp()
, the message fix this R CMD check NOTE: Note: found 137 marked UTF-8 strings. I don't really know, how to solve this issue or if it is an issue at all. I already had solved similar problems reported by CRAN in https://github.com/kamapu/taxlist/commit/3b2269ddb05ec01baebb7afec566b2c1d47a8665
In the case of data-raw/0_check.R
, this is the script I use to automatically build and check taxlist
before committing changes to GitHub (honestly, I'm sort of proud about it). I know, it may be confusing for contributors but I have to keep it in the source. Is there a better place and/or a better name that you may suggest for this script?
I forgot to mention it, @levisc8 : I work with eclipse + StatET (not RStudio), thus I need the data-raw/0_check.R
to carry out builds and tests.
Dear @levisc8 In your review, under Functionality, there are two unchecked boxes but without comments:
- [ ] Automated tests: Unit tests cover essential functions of the package and a reasonable range of inputs and conditions. All tests pass on the local machine.
- [ ] Packaging guidelines: The package conforms to the rOpenSci packaging guidelines.
I'm wondering if you may have forgotten to check them (in that case, I just ignore them) or if there are issues to be solved on that regard.
... this questions is for all reviewers + editor: @levisc8 @mcsiple @maelle
goodpractice::gp()
will complain that:
Am I allowed to ignore these issues in the review? I mean, I can redact a brief response but no changes in code will be required.
@mcsiple : In your outputs of goodpractice::gp()
, the LaTeX error can be related to your local LaTeX distribution or your installed version of rmarkdown
.
Perhaps it can be solved by updating your LaTeX distribution and/or rmarkdown
. The later from the GitHub repository???
Hi @kamapu , Thanks for getting to this so quickly! I'll respond in order:
Re devtools::check()
: I'll try to get a pull request together by the middle of next week. The issue I had was not with check
itself, but running devtools::test
from a command prompt. The tests do not work interactively, and the line highlighted in @maelle 's comment here is exactly the fix I was suggesting. I believe that altering the tests that use that (with the exception of the one I mentioned in my review) will resolve that issue.
devtools::test()
on other people's source packages frequently (if ever). I did think it was worth pointing out though, as the tests do not technically pass on my local machine (which is why I didn't check the box in my review). It appears to come from some of the example data sets - you could try something like this to set it right. I think that check
gets fussy if there are any non-ASCII strings in a package, but my understanding of locales and encodings is very limited, so I don't know why it would get fussy about this.
It is a cool script! I don't mean to say "get rid of the script". You don't need to move it either, as the data-raw
directory is in your .rbuildignore
, and so will never find its way into a built package. I just meant that I don't think you need those packages in the Suggests
field of your DESCRIPTION file. The reason for this is that some users may want to install all dependencies, but not want to install unneeded packages for taxlist
's functionality (particularly stringi and devtools, which can take some time to install from source). Sorry for the confusion! Does that make sense?
Packaging guidelines - I think I left that un-checked because of the website, but that is clarified now so I've marked it complete :)
@levisc8 I was just reading your comments again:
1. Regarding `devtools::check()`. Everything is running OK in my case. May this problem be related with the fact that data have a different relative path in the source as in the installed package? In any case, if your offer of solving it in the source is still open, then go ahead.
I guess, I'm able to do the proposed change and test it (using devtools::test()
).
@levisc8 Just a comment on your comment:
- It appears to come from some of the example data sets - you could try something like this to set it right...
The encoding issues have been the biggest nightmare not only when programming but also in real life (see here).
I pretend to assume, that the issue was already solved before and goodpractice::gp()
is missing some settings to tolerate UTF-8 encodings.
@levisc8
you can replace the file.path(path.package(...)) with system.file('dir_within_pkg', 'file_name', package = 'taxlist')
This solves the problems in devtools::test()
but causes errors in devtools::check_built()
Hello @kamapu
Regarding test coverage we don't expect more than 75% coverage https://devguide.ropensci.org/building.html#testing
If the encoding issue is a problem with goodpractice, it might be worth opening an issue in goodpractice repo itself? You could also run a check on e.g. an R-hub platform with a different encoding setting https://blog.r-hub.io/2019/04/25/r-devel-linux-x86-64-debian-clang/#r-devel-linux-x86_64-debian-clang just to make sure you don't get any check NOTE/WARNING there.
Just curious why you use Eclipse, historical reasons or some advantages? No answer required, this is a bit off-topic.
Hello @maelle
All right.
Unfortunately I got the same output at r-hub:
- checking data for non-ASCII characters ... NOTE Note: found 137 marked UTF-8 strings
It's unfortunate but means the problem needs to be fixed (or well justified), because you'd get it on CRAN check platform that has this encoding setting. Would you mind opening a question about that at https://discuss.ropensci.org/?
Hello all: @maelle @levisc8 @mcsiple
I see, I have to dedicate some time with the example data, again. I though, I have solved it in the last release at CRAN but I just realized, the test is still providing the NOTE about UTF-8 Strings.
I assume, R CMD check is not giving any warning because of my local settings (I'm working in Linux).
Perhaps some of you could give me some hints:
The data is originally stored in PostgreSQL, also using UTF-8 encoding. I export the data as csv-file in data-raw/Easplist
, also using UTF-8. And then I create the taxlist
object in an R-Session. Perhaps something can be adjusted in this workflow to get rid of the warnings.
I was assuming that using Sys.setenv(LANG="en_US.iso88591")
I was testing the package in the CRAN settings but it is evidently not working.
I always advice our students to not confuse ERRORS with WARNINGS. And now I see me confusing WARNINGS with NOTES. Are the later usually tolerated?
they could but they need to be justified, and their mere existence delays the CRAN review process, that's why it's better to get rid of them.
@maelle I see your point, although I don't really know why UTF-8 is a problem. Anyway, the issue is that the vector including author names is prone to have "special characters", actually even taxon usage names (for instance the genus Isöetes and its species). In taxlist
the combination between taxon usage names and authority (columns TaxonName and AuthorName in slot taxonNames) is a sort of identity for a name (no duplicated allowed).
I'm willing to solve the problem but I will need support on this regard, especially because I wish, the package can still be of any use disregarding the local settings of the users. I also wish to reach THE solution, because the issue is emerging again, and again, ... :sob: :sob: :sob:
To clarify, I am not saying it's a bad NOTE but I think it'd be easier if you opened a thread at https://discuss.ropensci.org where more community members could chime in with ideas for making the NOTE disappear or writing a good comment for CRAN (who'd be the ones to be convinced the NOTE is fine), from their own experience.
I do understand encoding pain, my name is Ma?#$lle I mean Maëlle. :wink:
@maelle Thanks for clarification. I sort of lost the path (cause I'm doing many things in parallel). I'll do as suggested but not right now...
Perhaps it was a premonition but long time ago I decided to skip the accent from Álvarez when self-referencing...
Dear @maelle @mcsiple @levisc8
I just finished the response to your reviews. Note that there are few discrepancies and unsolved issues. I hope, the improvements will be satisfactory for this submission. Thank you again!
The edited review is here
Thanks @kamapu, by unsolved issues you don't mean something where a bit more time would help?
@mcsiple @levisc8 are you ok with the response?
@maelle "unsolved issues" actually mean, it is not depending on me anymore...
Before I get to this again, just want to clarify - do the unresolved issues refer to the UTF-8 strings in the example data set? Or something else?
In general I consider as unresolved:
1) UTF-8 Note by goodpractice::pg()
and rhub::check(platform="debian-clang-devel")
.
2) I think, @mcsiple got several error or warning messages, in part related to LaTeX, which I'm supposing to be relate with her local settings but it should still be clarified.
Of course, I can still receive and try suggestions.
Hi @kamapu,
As I was interested in your issue, I had a look at it. Looks like the UTF-8 note you got comes from your dataset (Easplist.dra
) (see https://github.com/dankelley/oce/issues/1211)
R> tools:::.check_package_datasets(".")
Note: found 137 marked UTF-8 strings
Something that is not checked by default by devtools::check()
(I'm guessing here) but is checked if you do the combo R CMD build
/R CMD check
.
Hope this could be useful.
Dear all, sorry for annoying you with the issue of encodings. While I am sort of developing a pathological aversion for this aspect of programming, for people dealing with taxonomy (including myself), the use of names and their authorities is quite crucial in the management of information, thus considered in taxlist
as a component of the identity of a taxon concept (the accepted name of the taxon) and the taxon usage id (alternative names used to refer to a taxon), and this is the matter of my headaches: global context = special characters...
The possible solutions explored until now, just to get rid of a note, are not satisfying me in a 100%.
Now I have a request for you: The next is a block of code including the outputs in my console (I'm working in Linux with UTF-8 locale). My question is: Do you get the exact output as in my case?
library(stringi)
library(taxlist)
Names <- Easplist@taxonNames$AuthorName[c(5299, 5021, 5019)]
Names
#> [1] "Borsch, Kai Müll. & Eb.Fisch." "Ruiz & Pav."
#> [3] "Ség."
stri_enc_mark(Names)
#> [1] "UTF-8" "ASCII" "UTF-8"
iconv(Names, "utf8", "ascii")
#> [1] NA "Ruiz & Pav." NA
stri_enc_toascii(Names)
#> [1] "Borsch, Kai M\032ll. & Eb.Fisch." "Ruiz & Pav."
#> [3] "S\032g."
stri_trans_general(Names, "latin-ascii")
#> [1] "Borsch, Kai Mull. & Eb.Fisch." "Ruiz & Pav."
#> [3] "Seg."
See also this discussion.
For reference link to Discourse thread https://discuss.ropensci.org/t/note-on-utf-8-strings-by-goodpractice-gp/2165
@kamapu regarding your last question, I do get the exact same output as you do. Furthermore, stri_trans_general(Names, "latin-ascii") %>% str_enc_mark()
returns 'ascii' 'ascii' 'ascii'
.
Review round 2:
I'm generally fine with the new form of the package. I just have some minor comments at this point.
UTF-8 strings: I don't have strong feelings about this, as it is pretty far beyond my area of expertise. If you can fix it with the code blurb from above, excellent!
The pdflatex errors could be because pdflatex is not in the PATH variable (see here). I don't think there's much you can do about this from your side though.
Documentation is looking good.
package API: I understand the point regarding backward compatibility. I also don't think it's unreasonable to make breaking changes if they result in substantial improvements to the user experience. Incrementing by a major version number (e.g. 0.2 -> 1.0) should signal to most users that they should expect major changes. If users really don't want to change their code, they can install archived versions, either from Github or from CRAN. This is something I'd defer to rOpenSci editors on though, as I don't really believe it is a deal breaker or anything.
aggregate
/count_taxa
: good point - I hadn't thought about that! On the other hand, there is still an rm.na
argument for a couple methods. This could change to na.rm
for consistency w/ the base
/stats
packages. Personally, I would forget that it is rm.na
every single time I tried to use the package. Again, I appreciate the point about back-compatibility, but this doesn't need to be a breaking change right away - you can retain the old form and deprecate it for the next few minor versions as well.Hi @kamapu , @maelle et al.,
My apologies for the delay in responding to these new revisions and response. I have itemized them below mostly for my own mental organization (they match the structure of @kamapu 's response, I can keep track of what I have checked).
Note/caveat: I normally run all the checks on two different operating systems to provide better coverage in my ran everything on my other computer. This time I did my entire initial review on a Windows OS and am now checking everything again on my Mac. This could be a "good" thing because I've covered more ground, but IMO it's also suboptimal because some of the changes I've seen could potentially be caused by different software/packages instead of changes to the package. I think this is mostly the case with the LaTeX issues, but wanted to include the disclaimer.
This edit is great and satisfies my question(s) about the context and uses for the package.
Newly added contributing guidelines look good.
It looks like this one is still being hashed out to some extent... I hope you find a good resolution for it. It sounds from the ropensci.org post like maybe this will remain an issue for people with a setup that doesn't have a font that renders the non-ascii characters. Maybe a note in the vignette about whether/how this would affect functionality would be good, or better yet a custom check or error message upon install... I am sure other package developers will have this issue so you're doing a service to the community by trying to figure it out!
This one is my fault; I normally run all the checks on two different operating systems to provide better coverage in my ran everything on my other computer. No LaTeX errors in goodpractice::gp()
this time, so consider this part resolved.
devtools::check()
: I found no errors with R CMD check
this time.
Cross
question: the changes to the vignette explain the structure of the dataset better - thank you.Thanks - let me know if you have questions/comments about any of the above.
:wave: @kamapu, would you soon be able to respond the reviewers' new comments? Thank you.
Sorry for the break (many duties and a new family member arrived). I'm ready to go!
Dear @maelle @mcsiple @levisc8 I'm sorry that despite the progress on the review I had to stay out of programming for a while. Now I'm back and like to thank for all valuable comments again.
I will than refer to the comments of reviewers at round 2.
[x] UTF-8 strings Since the option is still open, I will take it. I would prefer to keep them as long as tolerated by CRAN.
[x] pdf-LaTeX errors It seems to be solved.
[x] Documentation is looking good. :+1:
[x] Change argument rm.na
trough na.rm
in function count_taxa()
It was in fact a mistake from my side. Resolved in https://github.com/kamapu/taxlist/commit/91ce67d32d322ec9d8a27347f83a2a4bee2fee17
Two comments to this review, otherwise I agree with the rest.
[x] UTF-8
Yes, this will be following me for a while. Since taxlist
is meant to be a recipient of information that is otherwise stored in a database application, the election of Encoding will probably depend on local settings and the encoding of the data source. It is interesting to experience that while our task is to provide a way to harmonize disagreement on nomenclatures for biodiversity assessments, we stumble onto Encoding issues :thinking:
[x] Taxon codes/identifiers
I added the explanations in the help document of taxlist-class
in https://github.com/kamapu/taxlist/commit/eae874affbbbe082c8d3c2435977dfdc64967615
Thanks @kamapu and congrats again on the new family member!
@mcsiple @levisc8 are you ok with the response?
Summary
The
taxlist
package structures taxonomic information into S4 objects and implements methods for the manipulation of contained information. Such objects may or may not contain information on synonymy, taxonomic ranks, parent-child relations, taxon views (references used to establish relation between taxon usage names and taxon concepts), and taxon (functional) traits.https://github.com/kamapu/taxlist
Reproducibility, because this package makes taxonomic information available in a quasi-standard format and tests inconsistencies on the content of taxonomic lists.
In general to taxonomists and biodiversity scientists, in particular to vegetation ecologists (
taxlist
objects are implemented in the package vegtable).While its functionality may overlap the package taxa, the package
taxlist
attempts to be flexible in the degree of completeness of data (incompleteness is very frequent in vegetation-plot databases), it is meant to be integrated in objects containing diversity information (as in the mentioned packagevegtable
) and to import data from local storage (spreadsheets, Turboveg data sets and even PostgreSQL tables by using vegtable2).Requirements
Confirm each of the following by checking the box. This package:
Publication options
paper.md
matching JOSS's requirements with a high-level description in the package root or ininst/
.Detail
[x] Does
R CMD check
(ordevtools::check()
) succeed? Paste and describe any errors or warnings:[x] Does the package conform to rOpenSci packaging guidelines? Please describe any exceptions:
If this is a resubmission following rejection, please explain the change in circumstances:
If possible, please provide recommendations of reviewers - those with experience with similar packages and/or likely users of your package - and their GitHub user names:
@arendsee @zachary-foster @sckott