Closed diazrenata closed 12 months ago
Thanks for submitting to rOpenSci, our editors and @ropensci-review-bot will reply soon. Type @ropensci-review-bot help
for help.
:rocket:
Editor check started
:wave:
git hash: bbe4921e
Package License: MIT + file LICENSE
The table below tallies all function calls to all packages ('ncalls'), both internal (r-base + recommended, along with the package itself), and external (imported and suggested packages). 'NA' values indicate packages to which no identified calls to R functions could be found. Note that these results are generated by an automated code-tagging system which may not be entirely accurate.
|type |package | ncalls|
|:----------|:---------|------:|
|internal |base | 124|
|internal |birdsize | 26|
|imports |magrittr | 40|
|imports |dplyr | 16|
|imports |stats | 12|
|imports |purrr | 1|
|imports |rlang | NA|
|suggests |covr | NA|
|suggests |ggplot2 | NA|
|suggests |knitr | NA|
|suggests |rmarkdown | NA|
|suggests |testthat | NA|
|linking_to |NA | NA|
Click below for tallies of functions used in each package. Locations of each call within this package may be generated locally by running 's <- pkgstats::pkgstats(
list (30), as.character (28), as.numeric (17), mean (7), c (5), nrow (5), colnames (4), sum (4), which (4), for (2), length (2), log (2), tolower (2), all (1), any (1), data.frame (1), exp (1), F (1), matrix (1), names (1), ncol (1), return (1), sqrt (1), suppressMessages (1), unique (1)
%>% (40)
add_estimated_sds (2), clean_sp_size_data (2), get_sd_parameters (2), get_sp_mean_size (2), ind_draw (2), individual_metabolic_rate (2), is_unidentified (2), community_generate (1), community_summarize (1), filter_bbs_survey (1), find_nontarget_species (1), find_unidentified_species (1), generate_sd_table (1), identify_richness_designator (1), pop_generate (1), pop_summarize (1), species_define (1), species_estimate_sd (1), species_lookup (1)
mutate (5), filter (4), n (3), left_join (1), row_number (1), select (1), summarize (1)
sd (7), lm (2), formula (1), rnorm (1), var (1)
pmap_dfr (1)
base
magrittr
birdsize
dplyr
stats
purrr
This package features some noteworthy statistical properties which may need to be clarified by a handling editor prior to progressing.
The package has: - code in R (100% in 15 files) and - 1 authors - 6 vignettes - 7 internal data files - 5 imported packages - 19 exported functions (median 13 lines of code) - 19 non-exported functions in R (median 22 lines of code) --- Statistical properties of package structure as distributional percentiles in relation to all current CRAN packages The following terminology is used: - `loc` = "Lines of Code" - `fn` = "function" - `exp`/`not_exp` = exported / not exported All parameters are explained as tooltips in the locally-rendered HTML version of this report generated by [the `checks_to_markdown()` function](https://docs.ropensci.org/pkgcheck/reference/checks_to_markdown.html) The final measure (`fn_call_network_size`) is the total number of calls between functions (in R), or more abstract relationships between code objects in other languages. Values are flagged as "noteworthy" when they lie in the upper or lower 5th percentile. |measure | value| percentile|noteworthy | |:------------------------|-----:|----------:|:----------| |files_R | 15| 73.0| | |files_vignettes | 6| 97.9| | |files_tests | 11| 91.7| | |loc_R | 384| 39.4| | |loc_vignettes | 398| 72.3| | |loc_tests | 330| 66.1| | |num_vignettes | 6| 98.7|TRUE | |data_size_total | 50278| 79.8| | |data_size_median | 5209| 74.7| | |n_fns_r | 38| 47.6| | |n_fns_r_exported | 19| 65.9| | |n_fns_r_not_exported | 19| 39.2| | |n_fns_per_file_r | 3| 47.0| | |num_params_per_fn | 1| 1.6|TRUE | |loc_per_fn_r | 16| 51.4| | |loc_per_fn_r_exp | 13| 30.5| | |loc_per_fn_r_not_exp | 22| 66.9| | |rel_whitespace_R | 36| 61.4| | |rel_whitespace_vignettes | 69| 90.5| | |rel_whitespace_tests | 59| 84.6| | |doclines_per_fn_exp | 20| 14.0| | |doclines_per_fn_not_exp | 0| 0.0|TRUE | |fn_call_network_size | 14| 39.0| | ---
Click to see the interactive network visualisation of calls between objects in package
goodpractice
and other checks#### 3a. Continuous Integration Badges (There do not appear to be any) **GitHub Workflow Results** | id|name |conclusion |sha | run_number|date | |----------:|:--------------------------|:----------|:------|----------:|:----------| | 4263965118|check-coverage |success |999038 | 49|2023-02-24 | | 4263965032|pages build and deployment |success |999038 | 61|2023-02-24 | | 4263965125|pkgcheck |success |999038 | 56|2023-02-24 | | 4263965120|pkgdown |success |999038 | 59|2023-02-24 | | 4263965126|R-CMD-check |NA |999038 | 90|2023-02-24 | --- #### 3b. `goodpractice` results #### `R CMD check` with [rcmdcheck](https://r-lib.github.io/rcmdcheck/) rcmdcheck found no errors, warnings, or notes #### Test coverage with [covr](https://covr.r-lib.org/) Package coverage: 88.26 #### Cyclocomplexity with [cyclocomp](https://github.com/MangoTheCat/cyclocomp) No functions have cyclocomplexity >= 15 #### Static code analyses with [lintr](https://github.com/jimhester/lintr) [lintr](https://github.com/jimhester/lintr) found the following 241 potential issues: message | number of times --- | --- Avoid 1:nrow(...) expressions, use seq_len. | 2 Avoid library() and require() calls in packages | 13 Lines should not be more than 80 characters. | 224 Use <-, not =, for assignment. | 2
|package |version | |:--------|:--------| |pkgstats |0.1.3 | |pkgcheck |0.1.1.11 |
This package is in top shape and may be passed on to a handling editor
Thanks @diazrenata for your full submission!
Before I start the search for a handling editor can you please address these two minor yet important issues?
--
In the pre-submission I see:
One thing to note is that given the package contains external data, it would be appropriate to state the source in the DESCRIPTION file under authors using author type "dtc" which stands for "data contributor". -- @annakrystalli
But DESCRIPTION still shows role = c("aut", "cre")
rather than role = c("aut", "cre", "dtc")
.
Can you please make that change or argue against it?
--
On the package website I see a lot of documentation and examples, well done! However the landing page is README and I see very little there. an you please add the most important bits to README to help editors, reviewers, and users understand the package quickly?
The author's guide is a bit vague about whether you should duplicate in README documentation you already have elsewhere, but previous reviews and packages in the wild suggest a self-contained README is as important to a package as an abstract is to an academic paper.
A good guide for what to include in a helpful README is this one: https://devguide.ropensci.org/building.html#readme
Hi Mauro, thank you for the quick response! In response to your points....
I agree that this is an important conversation, and I want to be sure to handle this appropriately. I included a little more context on this in my initial submission, which I am quoting again below to make it easier to find:
As part of that conversation, the question was raised of adding the authors of some of the datasets that this package draws on as "data contributors". I agree that this is an important consideration, and I wanted to go ahead and include a little more information here so we can make sure this is done in the most appropriate way.
This package uses two sources of "external" data: First, the sd_table dataset included in the package includes (cleaned and selected) data values hand-entered from the CRC Handbook of Avian Body Masses (Dunning 2008; https://doi.org/10.1201/9781420064452). Neither Dunning, nor the authors of the studies cited in the CRC Handbook, were involved in this project. In the current iteration, I've followed the approach I would use for a paper - that is, the package and package documentation cite Dunning liberally, but I have not listed any additional authors as "data contributors" because I generally wouldn't list folks as co-authors without their knowledge and consent. In this context, would you encourage listing Dunning as a contributor, and/or reaching out to open that conversation?
Second, this package is designed to interface with the North American Breeding Bird Survey data (https://www.sciencebase.gov/catalog/item/5d65256ae4b09b198a26c1d7, doi:10.5066/P9HE8XYJ), but I have taken care not to redistribute any actual data from the Breeding Bird Survey in the package itself. The demo_route_raw and demo_route_clean data tables in birdsize are synthetic datasets that mimic data from the Breeding Bird Survey. That is, they have the same column names as BBS data, and valid AOU (species identifying codes) values, but the actual data values are simulated. The bbs-data vignette directs users to instructions for accessing the BBS data, and demonstrates using the functions in birdsize on BBS-like data using the demo routes. Again, the package cites the Breeding Bird Survey liberally, but stops short of redistributing data so as to encourage users to access and cite the creators directly.
For both of these, again, I'm happy to explore whatever approaches to citing/crediting the original data creators seems most appropriate! I'd appreciate any thoughts or guidance in this area.
I have updated the README to be more comprehensive. Some of the text is borrowed from the Getting started vignette, and I am currently directing folks to the [community](https://diazrenata.github.io/birdsize/articles/community.html] vignette for worked examples. (If it would be preferable, and not too redundant, I can copy the contents of that vignette directly to the README).
I hope this helps provide context to get things started! Thank you again.
Thanks a lot @diazrenata for making your arguments about the data visible here. I'm more than happy with your careful consideration, and prefer for the handling editor or reviewers to follow up.
Also thanks for working on README. Anything that makes saves a bit of time to our reviewers will free them cognitive load to focus on the more interesting aspects of your work.
I'll start looking for a handling editor.
@ropensci-review-bot assign @maelle as editor
Assigned! @maelle is now the editor
Thanks for your submission @diazrenata! I'll start looking for reviewers. A few comments in the meantime:
desc::desc_normalize()
: it will order DESCRIPTION fields in a standard way and order dependencies alphabetically.starts_with()
helper.----
, the comment will appear in the script outline on the right in RStudio IDE (if you use that IDE), helping code navigation. https://blog.r-hub.io/2023/01/26/code-comments-self-explaining-code/#use-comments-for-the-scripts-outlinebirdsize:::
as testthat loads your package code. Example https://github.com/diazrenata/birdsize/blob/7469df457989a9016ecc3761b0dd125497a3be51/tests/testthat/test-02_included_data.R#L13@ropensci-review-bot seeking reviewers
Please add this badge to the README of your package repository:
[![Status at rOpenSci Software Peer Review](https://badges.ropensci.org/577_status.svg)](https://github.com/ropensci/software-review/issues/577)
Furthermore, if your package does not have a NEWS.md file yet, please create one to capture the changes made during the review process. See https://devguide.ropensci.org/releasing.html#news
@ropensci-review-bot add @mstrimas to reviewers
@mstrimas added to the reviewers list. Review due date is 2023-03-23. Thanks @mstrimas for accepting to review! Please refer to our reviewer guide.
rOpenSci’s community is our best asset. We aim for reviews to be open, non-adversarial, and focused on improving software quality. Be respectful and kind! See our reviewers guide and code of conduct for more.
@mstrimas: If you haven't done so, please fill this form for us to update our reviewers records.
@ropensci-review-bot set due date for @mstrimas to 2023-04-04
Review due date for @mstrimas is now 04-April-2023
Hi @diazrenata, I'm looking forward to reviewing the package! I won't be able to get to looking at in detail until later in March, but I took a quick skim through the code and sent a small PR with a few quick fixes. Also, I thought I'd mention a few general things now ahead of my full review in April since they may take some time to address:
Code > Reflow Comment
in RStudio to help with this.%>%
a lot within package functions. I don't think there's any strict rules against that, but I typically avoid it since it can make debugging trickier by producing a confusing call stack trace if there's an error. See this comment on the rOpenSci forum. The response notes that the native pipe |>
avoids the issue, but I personally think just dropping pipes altogether can be cleaner in package functions. This is personal preference though, so if you really prefer keeping the pipes that's all good :) Of course, using pipes in examples, vignettes, readme, etc. is perfectly fine and I think aids readability.Thanks @mstrimas!
Adding a reference about code comments https://blog.r-hub.io/2023/01/26/code-comments-self-explaining-code/
Thank you both for your time and attention! I'll incorporate these changes as quickly as I can, probably early next week!
@ropensci-review-bot add @qdread to reviewers
@qdread added to the reviewers list. Review due date is 2023-04-11. Thanks @qdread for accepting to review! Please refer to our reviewer guide.
rOpenSci’s community is our best asset. We aim for reviews to be open, non-adversarial, and focused on improving software quality. Be respectful and kind! See our reviewers guide and code of conduct for more.
@qdread: If you haven't done so, please fill this form for us to update our reviewers records.
I just finished up my review. This is a concise, well-documented package and was a pleasure to review. Note that the suggestions I made in an earlier comment on this issue, and in the PR I submitted, still apply and I haven't duplicated them below.
Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide
The package includes all the following forms of documentation:
community_generate
rather than community_generate()
. If you add the ()
, pkgdown
will recognize it as a function and generate a link to the documentation for that function on the website.set.seed(22)
calls throughout the vignettes. I think it would be better to set the seed once at the top of each vignette rather than repeatedly setting it.community
vignette, you confusingly have demo_route_raw <- demo_route_raw
. I think what you may want here is data(demo_route_raw)
. However, I think even that has fallen out of favor and it's best to just use the object directly (See "Warning" box at https://r-pkgs.org/data.html#sec-data-data).try(species_define(genus = "taylor", species = "swift"))
example :)birdsize:::species_estimate_sd
. If you're going to use this function in a vignette, it should be exported.URL
, BugReports
and Maintainer
(which may be autogenerated via Authors@R
).Estimated hours spent reviewing: 7
This is a useful, well-documented package with a clear and well defined scope. The documentation and vignettes have plenty of examples that I think most users should easily be able to follow. In addition to suggestions I've made in an earlier comment, I have a few suggestions that I think would improve the package documentation and align the source code better with best practices.
Working through the five vignettes, I did wonder if the vignette ordering and organization should be re-worked. For me, I found the population
vignette to be the most useful for understanding what the package is actually doing under the hood. After working through this vignette, the community
vignette made more sense to me. The species vignette covers an intermediate-level topic, so probably belongs next, followed by scaling
since it's a more advanced topic that many users won't require knowledge of. Both the README and the "Getting Started" vignette contain essentially no code. This feels like missed opportunity to shown some of the most basic functionality, e.g. simulate a population of a single species and plot a histogram and/or simulate a community. Finally, there is fair amount of duplication in the vignettes, which makes me wonder if some of them could be combined together into one? All this may just be personal preference though, and other users may find the current organization better, but something to think about.
This package relies heavily on BBS data in all examples and package functionality. Given its prominence, I think there could more description of what the BBS is and the structure of the dataset. I initially assumed the bbs-data
vignette would cover this, but it mostly duplicates what's already found in the community
vignette without providing additional explanation of the BBS. I think at the very least a brief description of the route/stop structure, sampling design, and spatial/temporal coverage of the BBS is warranted. Also, I see the fields of demo_route_raw
are described in the help for that dataset, but I think you should point users to that help file or directly include a description of the fields in the bbs-data
vignette. I don't think you need to get into extensive detail since all of this is explained in other places, which you've referenced, but you should provide at least some explanation.
Defining species in function arguments could be clarified. In the arguments to pop_generate()
and species_define()
, species can be identified either by AOU code or scientific name. Given that both genus
and species
are required, it feels more intuitive to me to use a single argument (e.g. scientific_name
). The documentation also doesn't make it clear that species
is the species' epithet and not the scientific or common name. As it stands, it feels like you should be able to call pop_generate(100, species = "Selasphorus calliope")
or even pop_generate(100, species = "Calliope Hummingbird")
.
I wonder if the _summarize()
functions are necessary since they're essentially just calling group_by() %>% summarize()
. Personally I'd prefer to do this directly myself with dplyr
so I know exactly what's going on and the vignettes could demonstrate exactly how users should do it. However, I can imagine that some users aren't as comfortable with dplyr
so these convenience functions could be useful.
The internal function ind_draw()
seems dangerous to me since it has a while loop to get rid of negative sizes with potential to run for a very long time. This is especially true because there appears to be no checks to ensure the mean size is positive. For example the following will run indefinitely:
pop_generate(1000, mean_size = -1000, sd_size = 0.001)
At the very least, please add a check to ensure mean_size
and sd_size
are positive and some method to ensure the while loop won't run forever, e.g. after a certain number of iterations maybe it should stop and raise an error. Even better would be re-writing this function to use a less brute force method to ensure the sizes generated aren't negative. It's not immediately clear what that would look like, since simple solutions like take the absolute value, won't preserve the desired normal distribution.
Some additional, specific comments about the code:
R/data.R
rather than individual R files. Not a huge issue, but I think this would help anyone interested find the source for the documentation more easily. See https://r-pkgs.org/data.html#sec-documenting-data.R/simulate_populations.R
is pop_generate()
, so why not call that file R/pop_generate.R
so it's more obvious what's in the file?dplyr
has deprecated or superseded a lot of the "scoped" verbs, e.g. group_by_at()
. Please replace these as appropriate to future proof your package. See https://dplyr.tidyverse.org/reference/scoped.htmlif()
and in other places if ()
. I recommend picking one method of formatting your code and sticking with it. See https://style.tidyverse.org/.as.numeric(NA)
or is.character(NA)
(e.g. https://github.com/diazrenata/birdsize/blob/main/R/species_define.R#L56). Note that there are NAs specifically for different data types, i.e. NA_real_
and NA_character_
. It seems cleaner to use these rather than casting NA
to different data types.if (A) {if (B) {...}}
I think it's cleaner to use if (A && B) {...}
, but that may just be personal preference.Thanks a lot @mstrimas for your thoughtful review!! :pray:
As a side note on source filenames, it's also nice to align them with test filenames so if you rename one, make sure you rename the other. See https://r-pkgs.org/testing-basics.html#create-a-test and devtools::test_active_file()
.
@ropensci-review-bot submit review https://github.com/ropensci/software-review/issues/577#issuecomment-1478615356 time 7
Logged review for mstrimas (hours: 7)
Thanks for giving me the opportunity to review this cool package! I like bird body size a lot so it was enjoyable to review this. I think the package overall is good in terms of having a well-defined objective, meeting that objective, and documenting how it's done. I do have some suggestions for improvement. If anything needs clarification, don't hesitate to get in touch with me! I also want to say that I agree with essentially everything in Matt's review, so I've tried not to be too redundant with what he already said.
Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide
The package includes all the following forms of documentation:
URL
, BugReports
and Maintainer
(which may be autogenerated via Authors@R
)..data$
from in front of the column names wherever it appears. The error that I got was related to unzip()
in test-10_direct_bbs_data.R
. The unzipping did not work so none of the downstream code worked. I think it would be ideal to have the unit tests run without warnings or errors.Estimated hours spent reviewing: 5
I think the purpose of the package is well-defined. It does a few well-defined and related tasks, and does them well. It also has demonstrated use cases, which is nice. The vignettes were well-organized and do a good job of providing examples of all the functions. I've divided my comments into three categories: code style, documentation, and "science stuff."
data.R
file which is a little easier to navigate (though this is not too relevant to users as they won't ever see that, but it would be helpful for contributors)dplyr::mutate()
could be rewritten fromcommunity <- community %>%
dplyr::mutate(
richnessSpecies = .data$aou,
species_designator = "aou"
)
to
community[['richnessSpecies']] <- community[['aou']]
community[['species_designator']] <- "aou"
I understand this would be a lot of work to implement because tidyverse is used in most of your functions. If you do not want to fully remove the tidyverse dependencies, the most urgent one to address would be to replace dplyr::group_by_at()
with dplyr::group_by(dplyr::across())
. group_by_at()
has been deprecated and will likely be removed from future versions of dplyr. (I'm in agreement with Matt's advice on that point).
%>%
for the reasons he cited.filter_bbs_survey()
, package data are loaded with unidentified_species <- unidentified_species
. I am not sure that is the recommended way to internally use package data. I noticed Matt also brought this up so I would follow his advice there.simulate_populations()
, the error checking routines that cause failure if things like mean and standard deviation aren't provided is one level down in the ind_draw()
function. To me it would make more sense if the input is checked for errors right away instead of further down. species_data_functions.R
: the argument data
to lm()
should be named.community
does not make it clear exactly what can be learned from going through the vignette.select()
and pmap_df()
because those packages are not explicitly loaded in the vignette.aou
but I think it would be more consistent to make it capitalized AOU
.filter_bbs_survey()
to take arguments so that the user could customize removing specific species groups. For instance it would be interesting if the user could remove only waterbirds and keep nocturnal birds, if they were so inclined.simulate_populations.R
in ind_draw()
at line 34:population <- rnorm(n = species_abundance, mean = species_mean, sd = species_sd)
while (any(population < 0)) {
population[which(population < 0)] <- rnorm(n = sum(population < 0), mean = species_mean, sd = species_sd)
}
This looks like you are doing rejection sampling to generate samples from a truncated normal distribution with lower truncation bound of 0. I think that is fine, but it should be made clear in the documentation that you are doing this. I might even recommend allowing the user to input a lower truncation bound instead of hard-coding it at 0
(the default could be 0
but you could allow the user to modify this). For example the user might want to ensure that all masses are greater than 2 grams (that is roughly the lowest value I got when I generated the body masses for a_hundred_hummingbirds
). Not too many birds weigh less than the Calliope Hummingbird! :-) Actually, in general I don't think it is clear in your documentation that samples are drawn from a normal distribution. Yes, it is implied by the fact that the parameters are mean and standard deviation but I think it would be good to be explicit about it. I also wanted to address Matt's point that the while
loop in the rejection sampling may run infinitely or nearly so if invalid input is provided such as negative body mass. It would be good for you to expand the error checking code to cause failure on body mass means that are not positive, avoiding the potential of an infinite (or almost infinite loop).
msm::tnorm()
and truncnorm::truncnorm()
but you may not want to add a dependency to your package.@qdread Thanks for expanding on my comments about the while loop! I hadn't heard of "rejection sampling" before. truncnorm
seems quite lightweight, so maybe that's a good option. Or keep the while loop but put in some logic so it stops after some number of iterations. Also, checking the inputs so the mean isn't negative would help. I like the suggestion of having the minimum size being user defined. Anyway, I would follow whatever @qdread suggests on this since I don't really know anything about this topic.
Regarding the comment about .data$
throwing warnings during testing, the PR I submitted should resolve this https://github.com/diazrenata/birdsize/pull/67.
Thanks @qdread for your thoughtful review! :smile: :bird: You mean using tidyverse packages increases the number of upstream dependencies, not downstream, correct?
@ropensci-review-bot submit review https://github.com/ropensci/software-review/issues/577#issuecomment-1494333794 time 5
Logged review for qdread (hours: 5)
@maelle Yes, upstream :-)
I thought that as a salmon I was maybe confused about the flow direction. :grin:
:calendar: @qdread you have 2 days left before the due date for your review (2023-04-11).
@qdread please ignore the comment above, sorry (we're investigating the bug :sweat_smile: ).
@diazrenata: please post your response with @ropensci-review-bot submit response <url to issue comment>
if you haven't done so already (this is an automatic reminder).
Here's the author guide for response. https://devguide.ropensci.org/authors-guide.html
@diazrenata any update? :smile_cat:
@maelle (and everyone!) Thank you very much for the feedback! I'm in the progress of incorporating changes. I am dealing with some health issues at the moment and have slightly longer turnaround times than usual, for which I apologize!
@diazrenata no problem, thanks for the update, take care!
Note that if needed we can put the submission on hold https://devguide.ropensci.org/policies.html?q=hold#policiesreviewprocess
The author can choose to have their submission put on hold (editor applies the holding label). The holding status will be revisited every 3 months, and after one year the issue will be closed.
@maelle, thank you very much for pointing this out! I think putting this on hold for the next 3 months would be the ideal move here. I expect to be able to complete the revisions within those 3 months, so things should proceed smoothly from there!
@ropensci-review-bot put on hold
Submission on hold!
@diazrenata Done! Thank you and take care.
@maelle: Please review the holding status
@diazrenata just checking in :smile_cat:
@diazrenata any update? I hope you're ok! :smile_cat:
@diazrenata just checking in, I hope you're ok.
Hi @maelle, thank you for checking in! I apologize for the long hiatus. I am back online, and have just about completed revisions! I expect to resubmit them in the next week, and no later than the end of the month, if that is all right?
Ok, thank you, glad to read you're back online! Here's the author guidance (as a reminder): https://devdevguide.netlify.app/softwarereview_author#the-review-process
Date accepted: 2023-11-30
Submitting Author Name: Renata Diaz Submitting Author Github Handle: !--author1-->@diazrenata<!--end-author1-- Repository: https://github.com/diazrenata/birdsize Version submitted:0.0.0.9000 Submission type: Standard Editor: !--editor-->@maelle<!--end-editor-- Reviewers: @mstrimas, @qdread
Due date for @mstrimas: 2023-04-04 Due date for @qdread: 2023-04-11Archive: TBD Version accepted: TBD Language: en
Scope
Please indicate which category or categories from our package fit policies this package falls under: (Please check an appropriate box below. If you are unsure, we suggest you make a pre-submission inquiry.):
Explain how and why the package falls under these categories (briefly, 1-2 sentences):
This package automates a workflow (seen in the wild e.g. here and here) of generating estimates of body size and basal metabolic rate, from the individual to ecosystem level, for birds. In my presubmission inquiry (#561, https://github.com/ropensci/software-review/issues/561), the editors determined this was best described as "field and lab reproducibility tools" because it's automating a workflow used by empirical ecologists to work with field data.
The target audience is ecologists/biodiversity scientists interested in studying the structure, function, and dynamics of bird populations and communities - specifically linking between abundances/population size and other dimensions of community function, like total biomass. Studying size-based and abundance-based properties of bird communities is key to understanding biodiversity and global change, but it is challenging for most ecologists because most survey methods do not collect size-related data. This package standardizes a computationally-intensive workaround for this challenge and makes it accessible to ecologists with relatively little computational training.
I have not encountered another package that accomplishes this.
N/A
https://github.com/ropensci/software-review/issues/561
The editor who responded to me was @annakrystalli.
As part of that conversation, the question was raised of adding the authors of some of the datasets that this package draws on as "data contributors". I agree that this is an important consideration, and I wanted to go ahead and include a little more information here so we can make sure this is done in the most appropriate way.
This package uses two sources of "external" data:
First, the
sd_table
dataset included in the package includes (cleaned and selected) data values hand-entered from the CRC Handbook of Avian Body Masses (Dunning 2008; https://doi.org/10.1201/9781420064452). Neither Dunning, nor the authors of the studies cited in the CRC Handbook, were involved in this project. In the current iteration, I've followed the approach I would use for a paper - that is, the package and package documentation cite Dunning liberally, but I have not listed any additional authors as "data contributors" because I generally wouldn't list folks as co-authors without their knowledge and consent. In this context, would you encourage listing Dunning as a contributor, and/or reaching out to open that conversation?Second, this package is designed to interface with the North American Breeding Bird Survey data (https://www.sciencebase.gov/catalog/item/5d65256ae4b09b198a26c1d7, doi:10.5066/P9HE8XYJ), but I have taken care not to redistribute any actual data from the Breeding Bird Survey in the package itself. The
demo_route_raw
anddemo_route_clean
data tables inbirdsize
are synthetic datasets that mimic data from the Breeding Bird Survey. That is, they have the same column names as BBS data, and valid AOU (species identifying codes) values, but the actual data values are simulated. Thebbs-data
vignette directs users to instructions for accessing the BBS data, and demonstrates using the functions inbirdsize
on BBS-like data using the demo routes. Again, the package cites the Breeding Bird Survey liberally, but stops short of redistributing data so as to encourage users to access and cite the creators directly.For both of these, again, I'm happy to explore whatever approaches to citing/crediting the original data creators seems most appropriate! I'd appreciate any thoughts or guidance in this area.
pkgcheck
items which your package is unable to pass.N/A
Technical checks
Confirm each of the following by checking the box.
This package:
Publication options
[ ] Do you intend for this package to go on CRAN? tbd
[ ] Do you intend for this package to go on Bioconductor?
[X] Do you wish to submit an Applications Article about your package to Methods in Ecology and Evolution? If so:
MEE Options
- [x] The package is novel and will be of interest to the broad readership of the journal. - [x] The manuscript describing the package is no longer than 3000 words. - [x] You intend to archive the code for the package in a long-term repository which meets the requirements of the journal (see [MEE's Policy on Publishing Code](http://besjournals.onlinelibrary.wiley.com/hub/journal/10.1111/(ISSN)2041-210X/journal-resources/policy-on-publishing-code.html)) - (*Scope: Do consider MEE's [Aims and Scope](http://besjournals.onlinelibrary.wiley.com/hub/journal/10.1111/(ISSN)2041-210X/aims-and-scope/read-full-aims-and-scope.html) for your manuscript. We make no guarantee that your manuscript will be within MEE scope.*) - (*Although not required, we strongly recommend having a full manuscript prepared when you submit here.*) - (*Please do not submit your package separately to Methods in Ecology and Evolution*)Code of conduct