Closed wrightaprilm closed 5 years ago
thanks for your submission @wrightaprilm !
Doing editor checks right now. On trying to install I get
➜ treeStartR git:(master) Rscript -e 'library(devtools); document(); install()'
Updating treestartr documentation
Loading treestartr
Error: 'ants' is not an exported object from 'namespace:treestartr'
can you fix? then i'll continue with editor checks
I misunderstood some stuff about data packaging. But I think it's working now!
Quick question - a user noted a (both run time and unit) test I should add for their use case, and I have added it. Should I push that now, or wait?
Yes, go ahead! Let me know when that's pushed
Alright, pushed. Thanks!
thanks!, will have another look
Thanks for your submission @wrightaprilm !
Here's the output from goodpractice. If you haven't used goodpractice
it's an R package that checks a number of things with another package - most of which we agree with and want authors to follow. You don't need to address these now, it's info for reviewers to use to get started.
── GP treestartr ─────────
It is good practice to
✖ write unit tests for all functions, and all package code in general. 76% of code lines are covered by test cases.
R/absent_tippR.R:15:NA
R/absent_tippR.R:16:NA
R/absent_tippR.R:19:NA
R/absent_tippR.R:21:NA
R/absent_tippR.R:22:NA
... and 28 more lines
✖ avoid long code lines, it is bad for readability. Also, many people prefer editor windows that are about 80 characters wide. Try make your lines
shorter than 80 characters
R/taxon_testr.R:11:1
R/tree_df.R:20:1
✖ avoid sapply(), it is not type safe. It might return a vector, or a list, depending on the input data. Consider using vapply() instead.
R/make_absent_df.R:14:23
R/tree_df.R:24:21
✖ not import packages as a whole, as this can cause name clashes between the imported packages. Instead, import only the specific functions you need.
✖ fix this R CMD check NOTE: absent_tippr: no visible global function definition for ‘plot’ dataf_parsr: no visible global function definition for
‘read.csv’ test_dataf: no visible global function definition for ‘assert_that’ Undefined global functions or variables: assert_that plot read.csv Consider
adding importFrom("graphics", "plot") importFrom("utils", "read.csv") to your NAMESPACE file.
✖ fix this R CMD check WARNING: Undocumented code objects: ‘absent_list’ ‘absent_tippr’ ‘dataf_parsr’ ‘genera_strippr’ ‘mrca_df’ ‘present_tippr’
‘rand_absent_tippr’ ‘text_placr’ ‘tree’ Undocumented data sets: ‘absent_list’ ‘mrca_df’ ‘tree’ All user-level objects in a package should have documentation
entries. See chapter ‘Writing R documentation files’ in the ‘Writing R Extensions’ manual.
✖ fix this R CMD check WARNING: Data with usage in documentation object 'ants' but not in code: ants
Seeking reviewers now 🕐
Reviewers:
Reviewers:
@dwbapst @rachelwarnock Will you be able to get your review in soon? We're about 2 weeks past due date. Thanks!
Sorry, Scott, I sent you some emails a few weeks ago, which I later got weird bounces back from - they were about what a review looks like. I've got a heap of QA issues, but not sure what format I should put these into, or what level of code analysis you want me to do? -Dave
On Wed, Sep 12, 2018 at 2:30 PM Scott Chamberlain notifications@github.com wrote:
@dwbapst https://github.com/dwbapst @rachelwarnock https://github.com/rachelwarnock Will you be able to get your review in soon? We're about 2 weeks past due date. Thanks!
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ropensci/onboarding/issues/239#issuecomment-420769770, or mute the thread https://github.com/notifications/unsubscribe-auth/ABfVL1X1ENDBW9RtK_ZdZDPSsQJxTqFSks5uaWDzgaJpZM4Vbw08 .
-- David W. Bapst, PhD Asst Research Professor, Geology & Geophysics, Texas A & M University Postdoc, Ecology & Evolutionary Biology, Univ of Tenn Knoxville https://github.com/dwbapst/paleotree
Can you send that email again to myrmecocystus@gmail.com
No problem - I'm sorry that I didn't investigate after the bounces, but I think they came just as classes were beginning... -Dave
On Wed, Sep 12, 2018 at 2:34 PM Scott Chamberlain notifications@github.com wrote:
Can you send that email again to myrmecocystus@gmail.com
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.
-- David W. Bapst, PhD Asst Research Professor, Geology & Geophysics, Texas A & M University Postdoc, Ecology & Evolutionary Biology, Univ of Tenn Knoxville https://github.com/dwbapst/paleotree
Hi Scott,
I’m so sorry about the delay. I’ll try to wrap my comments & review up by the end of next week.
Best wishes,
Rachel
On Wed, 12 Sep 2018 at 21:30, Scott Chamberlain notifications@github.com wrote:
@dwbapst https://github.com/dwbapst @rachelwarnock https://github.com/rachelwarnock Will you be able to get your review in soon? We're about 2 weeks past due date. Thanks!
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ropensci/onboarding/issues/239#issuecomment-420769770, or mute the thread https://github.com/notifications/unsubscribe-auth/ACYzn0kgE644kHXESp5K7_d0rdAOjIVAks5uaWDzgaJpZM4Vbw08 .
-- Rachel.Warnock@gmail.com
Hi, I feel the same as Rachel. I'm new to rOpenSci reviewing and it just came at a bad time when I was teaching two graduate classes. In order to force myself to work on this, I'm going to work the draft live here in the issue thread.
The package includes all the following forms of documentation:
[x] A statement of need clearly stating problems the software is designed to solve and its target audience in README
[x] Installation instructions: for the development version of package and any non-standard dependencies in README
[x] Vignette(s) demonstrating major functionality that runs successfully locally
[x] Function Documentation: for all exported functions in R help
[?] Examples for all exported functions in R Help that run successfully locally
absent_tippr
.[?] Community guidelines including contribution guidelines in the README or CONTRIBUTING, and DESCRIPTION with URL
, BugReports
and Maintainer
(which may be autogenerated via Authors@R
).
-Most of this is here, but I don't see anything on contributing to the project - I don't usually include a section on that in packages myself, so I am not sure what ropensci wants here.
[x] Installation: Installation succeeds as documented. -Yeah, it installed. I would point out though that R, silly enough, is a case-sensitive language, and so although the package is referred to at times as 'treeStartR', it can only be installed or loaded into the library as 'treestartr'. Please correct all case uses. (I realize that git, annoyingly, isn't case sensitive, so nothing can be done about the repo name.)
[x] Functionality: Any functional claims of the software been confirmed. -I would say the software ran on the provided datasets, but more noisly (with more random messages, warnings, etc) than are needed and to the detriment of the software.
[x] Performance: Any performance claims of the software been confirmed.
[ ] Automated tests: Unit tests cover essential functions of the package and a reasonable range of inputs and conditions. All tests pass on the local machine. --There are tests, and they are reasonable tests, but they fail on my local machine.
tools::test("d://dave//workspace//treeStartR")
Loading treestartr
Testing treestartr
√ | OK F W S | Context
Error in x[[method]](...) : attempt to apply non-function
== Results ===========================================================================
OK: 0
Failed: 4
Warnings: 1
Skipped: 0
Actually... I am beginning to wonder if there is something broken with my local installation of testthat
. Hmm. Maybe disregard this comment for now if you can't reproduce it.
Anyway, this led me to running a local R CHECK --as-CRAN, where, oddly, the tests pass. It did produce some warnings (only one of which Scott notes above, that I'd like to comment on how to fix, as you indicate you would like to publish this on CRAN.
* checking top-level files ... NOTE
Non-standard files/directories found at top level:
'README.Rmd' 'README.html' 'bibliography.bib' 'deprecated' 'docs'
'images' 'paper' 'refs.bib'
These should be added to your .Rbuildignore file.
* checking R code for possible problems ... NOTE
absent_tippr: no visible global function definition for 'plot'
dataf_parsr: no visible global function definition for 'read.csv'
test_dataf: no visible global function definition for 'assert_that'
Undefined global functions or variables:
assert_that plot read.csv
Consider adding
importFrom("graphics", "plot")
importFrom("utils", "read.csv")
to your NAMESPACE file.
CRAN policy increasingly wants us, when we use functions readily available in packages included in a base installation (but not in the base
package) to directly list these as imports, and indicate that dependency in our DESCRIPTION file. Without looking though, I think the plot
it is referring to here is probably ape's plot.phylo
for plotting a phylogeny. I think best practice is that you should indicate this is ape::plot.phylo
when you use that function, even in examples (I haven't done this, but am trying to).
* checking for missing documentation entries ... WARNING
Undocumented code objects:
'absent_list' 'mrca_df' 'tree'
Undocumented data sets:
'absent_list' 'mrca_df' 'tree'
All user-level objects in a package should have documentation entries.
See chapter 'Writing R documentation files' in the 'Writing R
Extensions' manual.
Obviously these need to be covered. I think they are objects in the bears
dataset - all objects within data files need to be listed as aliases, e.g. in roxygen2 you can do something like #' @aliases RaiaCopesRule ceratopsianTreeRaia cervidTreeRaia
to indicate all the objects and their associated help file.
* checking for code/documentation mismatches ... WARNING
Data with usage in documentation object 'ants' but not in code:
ants
You need an example that makes use of the ant data - I don't usually get this warning, but my interpretation is that CRAN doesn't like packages with example datasets that aren't used - there was an issue a few years ago of people using CRAN as basically a repository for datasets for classes/textbooks, which isn't its intended use (and meant very large packages, size-wise, with almost no code base).
Estimated hours spent reviewing:
Review Suggestions from rOpenSci:
Does the code comply with general principles in the Mozilla reviewing guide?
Does the package comply with the rOpenSci packaging guide?
Are there improvements that could be made to the code style?
Is there code duplication in the package that should be reduced?
Are there user interface improvements that could be made?
Are there performance improvements that could be made?
Is the documentation (installation instructions/vignettes/examples/demos) clear and sufficient?
If you have your own relevant data/problem, work through it with the package. You may find rough edges and use-cases the author didn’t think about.
Checking the package’s logs on its continuous integration services (Travis-CI, Codecov, etc.)
Running devtools::check() and devtools::test() on the package to find any errors that may be missed on the author’s system.
Using the covr package to examine the extent of test coverage.
Using the goodpractice package (goodpractice::gp()) to identify likely sources of errors and style issues. Most exceptions will need to be justified by the author in the particular context of their package.
Using spelling::spell_check_package() (and spelling::spell_check_files("README.Rmd")) to find spelling errors.
So here's some random comments from playing with the package.
First, I strongly making a manual page named treeStartR or with that alias so that users who cannot remember a function name but can remember the package name can quickly access the documentation. This page also serves as a good starting point for users interested in the package, and you could put a lot of information from your readme there, such as directing them to the vignette.
absent_list <- genera_strippr(tree, bears)
This function expects a data.frame
and not any other sort of object - you should be explicit about this in the help file. It also expets that the variable containing the taxon names will be labeled taxon
and nothing else. This should also be laid out, possibly even allow users to vary it by adding an argument like `taxonNamesVariable = "taxon"' so the default behavior stays the same.
The manual page for this function refers to 'morphological & stratigraphic genera'. Its not clear to what the distinction is between these.
new_tree <- present_tippr(tree, absent_list)
The function present_tippr
is very noisy. Why is the message Tree tip names formatted correctly
repeated so many times? The message is not very informative to the user. The same issue occurs with function absent_tippr
.
More messages:
Adding tips with congeners on tree:Indarctos_punjabiensis
Adding tip via MRCA at 25
Adding tips with congeners on tree:Ursus_abstrusus
Adding tip via MRCA at 30
What are these number codes? Are they node numbers 25 and 30? I'd clarify the message to the console, or clarify the documentation, or both.
Another message, this time a warning:
Warning message:
In bind.tree(tree, tip, where = where, position = pp) :
one tree has no branch lengths, they have been ignored
I would suggest suppressing this expected, non-informative warning from bind.tree
by masking it with function suppressWarnings
. This also happens with functions absent_tippr
and text_placr
.
More useful information to report to the user, perhaps, would be the number of OTUs added and the number of OTUs not added, perhaps also giving user a taxon list of the still-absent OTUs, to use with the other taxon-placing methods.
Also, if only a single congener is on the tree, the the species gets added, but to the root node:
tree2<-drop.tip(tree,"Indarctos_vireti")
new_tree <- present_tippr(tree2, absent_list)
We can compare the two trees with:
layout(1:2)
plot(tree);plot(new_tree)
I don't this this behavior is desired, and even if it is (sometimes) desired in certain use-cases, it should be described in the manual and perhaps toggle-able with an argument.
new_tree <- absent_tippr(tree, absent_list)
It is not immediately clear when you use this function that the prompt is asking for me to type in the number of the node, rather than interactively click on the plot (as in some ape and phytools functions). You may want to clarify the prompt to users so they know they need to enter the number.
new_tree <- rand_absent_tippr(tree, absent_list)
Where would you like to place the tip?Zaragocyon_daamsi
Error in if (where <= length(tree$tip.label) && position == 0) { :
missing value where TRUE/FALSE needed
In addition: Warning messages:
1: In bind.tree(tree, tip, where = where, position = pp) :
one tree has no branch lengths, they have been ignored
2: In absent_tippr(tree, absent_list) : NAs introduced by coercion
In the above code example, apparently you cannot make two species sister to each other, You can only place them in a polytomy together with a third taxon. This seems like it might be an unintentional design oversight.
If only a single relative is listed with text_placr
, you get something like
new_tree <- text_placr(tree, mrca_df)
Placing tip Kretzoiarctos_beatrix
via relatives Indarctos_arctoides
at node
...And then the taxon gets added to the root node, bizarrely. Nothing in the documentation about more than one relative needing to be specified, or of what the algorithm's behavior is when only a single relative is specified. Again, this means you cannot make add a taxon as sister to another taxon, and it seems undesirable for users.
I tried generating an example dataset to see what happened.
library(paleotree)
set.seed(444)
# generate Genus Names
genNames<-paste0("R",
apply(combn(c("a","b","c","d","e"),2),
2,paste0,collapse="")
)
# generate species names
specNames<-apply(combn(c("a","b","c","d"),3),
2,paste0,collapse="")
#
#combine
extraSpec<-as.vector(sapply(genNames,function(x)
paste0(x,"_",specNames)))
tipSpec<-paste0(genNames,"_regularis")
#
#lets make our hypothetical simulated tree of higher taxa
singleSpeciesTree <- ape::rtree(10, tip.label=tipSpec)
#
# format extraSpec correctly
taxonDF<-data.frame(taxon=extraSpec)
#
library(treestartr)
absentList <- genera_strippr(singleSpeciesTree , taxonDF)
In the above, genera_strippr
seems to work, although I note that it prints the entire absentList
to the console, which is overly noisy.
thanks @dwbapst
@rachelwarnock okay, if it will be longer than that please let me know so I can get someone else
Okay @sckott @wrightaprilm I think my review is as complete as its going to be for now.
Thanks for that, @dwbapst. Really helpful stuff here (and thankfully all addressable)! @sckott, presumably I wait for @rwarnock, in case any of her review conflicts with Dave's. But after her's comes in, it looks like in the review guidelines, I can just start work on revision without waiting on an "Editor decision" in the way of a more traditional journal?
[If I'm wrong about the above, I'm certainly happy to contribute different wording to policies.html]
@wrightaprilm correct, let's wait for @rachelwarnock , then do revisions. correct, you don't need to wait for an editor decision
The package includes all the following forms of documentation:
URL
, BugReports
and Maintainer
(which may be autogenerated via Authors@R
).
devtools::check
Estimated hours spent reviewing:
Love the logo!
The main functionality of the package could be stated earlier on in the README. Its not until the end of third paragraph that you learn the main purpose of the package. You could include a sentence at the beginning, saying something like, "This package produces starting trees incorporating fossils for phylogenetic analysis", before jumping into your nice intro. Or the title of the README could be "treeStartR: generating starting trees for phylogenetic analysis".
The package depends on the package phytools, but since you use the phytools::
and ape::
function calling format (which is great programming practice) you don't have to load the phytools library on start up for the treestartr
functions to work (I tested to make sure).
Its useful to have the function names within text using the markdown coding format, i.e. dataf_parser
instead of dataf_parser. That way, the functions are highlighted if you want to quickly scan the README to locate info.
I don't think you need to use data(bears)
since this dataset is already loaded into the environment when you load the package. data(ants)
returns "data set 'ants' not found", although this dataset has a help page.
With the bears dataset, the absent_list
, mrca_df
and tree
components of the data are available, but not tax_list
. Instead, the info that should be represented by this variable appears to represented by the bears
variable. i.e. the vector output by genera_strippr(tree, bears)
matches absent_list
. This doesn't seem intentional.
With new_tree <- present_tippr(tree, absent_list)
, I don't really follow the screen output from this command. Why is "Tree tip names formatted correctly" printed to screen so many times (= 33)? length(absent_list) + length(tree$tip.label) = 22 only. With a very large dataset you could end up generating a lot of unneccessary screen output. Why not just print "all tip names formatted correctly" once after checking? or just return incorrectly formatted names. This also applies to other functions that generate this output (absent_tippr
, rand_absent_tippr
, get_lost
, get_found
).
Unclear what the source of this warning message means: Warning message: In bind.tree(tree, tip, where = where, position = pp) : one tree has no branch lengths, they have been ignored
I suppose this has something to do with the fact that the functions are returning trees with no branch lengths. This suggests that the fossil ages are not being taken into account. Will this not be important for generating starting trees for fbd/tip-dating analyses?
The manual option is tricky for trees even as large as the example dataset. For example, on my small laptop screen, some of the branching events are too close for me to be able to distinguish the node labels (i.e. many are plotted on top of each other). Not sure if there's anything you can do about this though.
Where it says TSV file, do you mean CSV? or CSV/TSV
Perhaps it would be useful to have an example, where multiple options are used to add taxa to the tree.
bears
- tax\_list
not available
ants
- has a help page but I can't seem to load the data
absent_tippr
- the documentation say tips are added "via user input" but its not clear that this is function is interactive.
dataf_parsr
- The function description mentions morphological and molecular data but I think this function is designed to read the file containing the stratigraphic info only.
Maybe include a url to the relevant RB tutorial? Where it describes the function argument dataf
, I think this should be "file name" not "data frame" (which is the output). In addition, because the functions don't return time scaled trees, I'm sure why the fossil ages are required.
genera_strippr
- no issues here.
get_found
, get_lost
- The distinction between these functions is a bit confusing. Could the dataframe output by this function have more useful column headers? Or perhaps these could be private internal functions.
make_absentdf
- Perhaps this could also be a private internal function.
present_tippr
- I would add the more comprehensive description of this function from the README here.
rand_absent_tippr
- no issues here.
text_placr
- maybe some additional clarification would be useful here. Am I right in thinking the clade
column really refers to a branch/tip? And that the function then builds a clade from the tips (e.g. Ursus_arctos, Ursus_spelaeus, Ursus_americanus) from which it works out where to place the taxon (e.g. Ursus_abstrusus). At first when I looked at mrca_df
I wasn't clear why there were multiple entries per taxon in the table.
I think it would be useful to have more complete examples in the documentation. e.g. for rand_absent_tippr
.
data(bears)
absent_list <- genera_strippr(tree, tax_list)
new_tree <- rand_absent_tippr(tree, absent_list)
write.nexus(new_tree)
That way, if the user wanted to use this function, they would know which other function help pages to refer to get all the info necessary to generate input.
On a related note, I initially found the relationship between the package functions hard to get my head around, while just scrolling through the help pages. I produced the following for reference.
dataf_parsr (load a total list of taxa) ->
genera_strippr (find out which taxa are already on the tree) ->
absent_tippr (add tips manually) /
present_tippr (add tips with congeners)/
rand_absent_tippr (add tips randomly) /
text_placr (add tips using csv) ->
write.nexus
Perhaps you could consider including a similar flow chart or table in the README. Definitely not necessary, esp. if you included more start-to-finish examples, so its just a thought!
My comments on the code are extremely minor. The functions all appear to be concise and the more complicated functions are nicely commented. I can't see any obvious room for improvement in terms of efficiency or redundancy, apart from one small thing noted below.
I'm not sure each function needs to be in a separate file (i.e. related functions could potentially live in the same file) but this is down to programmer's preference and since the package isn't massive it doesn't really matter.
genera_strippr
- why does this function print and return absent
?
Note that running devtools::check()
produces 1 error, 4 warnings and 3 notes. Might be good to double check these issues. devtools::spell_check()
identified one misspelling of phylogeny.
I simulated a dataset to explore the functions a bit more. Everything seems to work well but I noted the following:
dataf
(i.e. the age column can be absent and the functions still work).bears
dataset. i.e. the functions output trees with branch lengths and don't return a warning about this.If the functions could return time scaled starting trees that respect the fossil ages I think this would be extremely useful.
If I had to pick a favourite function it would be text_placr
. This is super useful :)
Note, I didn't look at Dave's review before writing this, but I can see some overlap. Hope this helps!
@dwbapst do you have a rough estimat of hrs spent doing your review? we like to keep track for use in a long term goal of using our reviewers time most effectively
@wrightaprilm All reviews in now. thanks much to @rachelwarnock and @dwbapst !
@sckott Honestly? It probably took more than 8 hours, maybe 10 hours, spread over a month? This isn't an atypical amount of time for to spend on reviewing, particularly software - what I did here is precisely I have done for reviewing papers describing new R packages at MEE. Some reviews have absolutely taken more than 10 hours, and many of my non-software reviews take longer as well.
@rachelwarnock I think that the functions can return non-binary trees is intentional, and sort of the point.
@dwbapst yes, honesty please. thanks very much.
Thanks so much to both of you, @dwbapst and @rachelwarnock, and to you, @sckott for facilitating. I'll probably get started on response to reviews (i.e., just fixing things) Friday.
@rachelwarnock I think that the functions can return non-binary trees is intentional, and sort of the point.
@dwbapst can you really specify unresolved starting trees in revbayes and beast2?
@rachelwarnock Yup, that was a dumb comment of mine. I don't know specifically what the starting tree requirements are, but you are probably right that they probably can't be non-binary.
I think I was implicitly imagining that you'd add OTUs to the initial tree, making a big informal taxon-tree sort of thing, but then for starting purposes, you'd want to randomly resolve those, and probably get some rough starting dates for the tree if for tip-dating analyses, as you talk about in your review above. I recall Nick Matzke using paleotree for that step to get starting trees for the BEAST2 analyses in the Archeopteryx paper that April and I are on. But those are steps that treestartr could help automate too.
@dwbapst @wrightaprilm I asked one of the BEAST2 developers (Tim Vaughan) if you can specify unresolved starting trees and he said: "Yes you can! This will just work. Read it in as a newick string with a polytomy. This will be automatically converted to a binary tree with a zero edge length by beast’s newick parser, and the operators will then quickly replace the zero-length edge with one of finite-length." Maybe worth clarifying in the package documentation! I presume this is also true in RB.
@rachelwarnock Yeah, I can't find anything about starting tree requirements for RevBayes, but certainly MrBayes only requires binary trees:
http://mrbayes.sourceforge.net/Help/usertree.html
@wrightaprilm So, while I realize MrBayes is on its way out, its still fairly widely used... thus, a comment in the documentation for treetstartr
to that effect, that MrBayes will only accept binary trees would probably reduce some confused emails from lost users.
Hey @dwbapst @rachelwarnock -
RB does read in polytomy trees, and it's a bit of analytical priggishness that treestartr
returns trees with polytomies. I'll make note of it in the docs, to resolve with multi2di
or similar if you're planning to use MB. And actually, this line of thinking is giving me an idea for something to add to the package -
exportation of RB-formatted clade constraints that include your fossils (see here).
Again, thanks so much for all your help kicking the tires on this thing.
Just making a TODO-list. Ignore, unless I've misrepesented your points ;)
[x] Making a manual page named treeStartR or with that alias so that users who cannot remember a function name but can remember the package name
[x] Report the number of OTUs added and the number of OTUs not added, perhaps also giving user a taxon list of the still-absent OTUs, to use with the other taxon-placing methods.
[x] Defend choice: In the above code example, apparently you cannot make two species sister to each other, You can only place them in a polytomy together with a third taxon. This seems like it might be an unintentional design oversight.
[x] Spellcheck
[x ] Bug: text_placr: And then the taxon gets added to the root node, bizarrely.
[x] Env: I don't think you need to use data(bears) since this dataset is already loaded into the environment when you load the package. data(ants) returns "data set 'ants' not found", although this dataset has a help page.
[x] rand_absent_tippr: clearer example
[x] Env: With the bears dataset, the absent_list, mrca_df and tree components of the data are available, but not tax_list.
[x] get_found, get_lost, make_absentdf: Make private
[x] Re: Ages. the immediate above: Print FBD-formatted clade constraints for RevBayes
[x] absent_tippr - the documentation say tips are added "via user input" but its not clear that this is function is interactive.
[x] dataf_parsr - The function description mentions morphological and molecular data but I think this function is designed to read the file containing the stratigraphic info only.
[x] genera_strippr: This function expects a data.frame. you should be explicit about this in the help file.
[x] genera_strippr: possibly even allow users to vary it by adding an argument like taxonNamesVariable = "taxon"' so the default behavior stays the same. _AMW_: Edited this behavior in the
dataf_parsr` function.
[x] genera_strippr: The manual page for this function refers to 'morphological & stratigraphic genera'. Its not clear to what the distinction is between these.
[x] genera_strippr: why does this function print and return absent?
[x] present_tippr: very noisy.
[x] present_tippr: clarify in docs
[x] present_tippr, absent_tippr and text_placr.: Suppress warning.
[x] text_placr: clarify inputs
[x] Bug: if only a single congener is on the tree, the the species gets added, but to the root node:
[x] absent_tippr: You may want to clarify the prompt to users so they know they need to enter the number.
[x] genera_strippr: although I note that it prints the entire absentList to the console, which is overly noisy.
[x] Man: Where it says TSV file, do you mean CSV? or CSV/TSV. AMW: Can be either - have noted this in documents
[x] ReadMe: The main functionality of the package could be stated earlier on in the README.
[x] README: The package depends on the package phytools, but since you use the phytools:: and ape:: function calling format (which is great programming practice) you don't have to load the phytools library on start up for the treestartr functions to work (I tested to make sure).
[x] README: ts useful to have the function names within text using the markdown coding format, i.e. dataf_parser instead of dataf_parser. That way, the functions are highlighted if you want to quickly scan the README to locate info.
[x] Link to docs for other package functions (i.e. phytools, phangorn)
I've made the vast majority of the straight-forward changes discussed above in as asked. There are several that bear more discussion, though.
Taxa being added to the root node: This was a bug resulting from getting the parent node of a vector, as opposed to a single taxon. It is now fixed.
Necessity of age information in the taxa file: Rachel noted correctly that the ages are never used. This is somewhat of a bias on my part - I wanted it to be possible to just take your RevBayes taxon file, and automatically generate a starting tree from it. I've now amended the behavior of the parser to allow you to do so, but so that it only requires a column of taxa (i.e., ages are optional). RevBayes (and BEAST I believe is the same) doesn't actually parse the dates off a starting tree; it gets them from the ages file and the fossil interval file, so decorating the tree with them wouldn't be overly helpful.
Polytomies: I added a section to the README to make it obvious that you'll need to resolve polytomies. Since we're not estimating placements from data, I pretty strongly feel that polytomies are the way to export the tree, and I think users should be forced to confront the decision to resolve them if that's what they'd like to do.
Formatted clade constraints: I added two functions echo_subtree
and echo_rb
. Echo subtree simply calls ape's print subtree function to print the subtree to which the tip was added, including the tip. echo_rb
created RevBayes formatted clade constraints. I added two tests for these functions.
Ants data: I just took this out. It's for the eventual MEE paper, and I think I would like it to be in a separate repo.
I have a couple CRAN check warnings that are both stubborn and cryptic. Can I ask about them?
Thanks so much to all three of you for your help with this; I can already see the package is much improved.
This all sounds great to me!
On 8 Oct 2018, at 22:24, April Wright notifications@github.com wrote:
I've made the vast majority of the straight-forward changes discussed above in as asked. There are several that bear more discussion, though.
Taxa being added to the root node: This was a bug resulting from getting the parent node of a vector, as opposed to a single taxon. It is now fixed. Necessity of age information in the taxa file: Rachel noted correctly that the ages are never used. This is somewhat of a bias on my part - I wanted it to be possible to just take your RevBayes taxon file, and automatically generate a starting tree from it. I've now amended the behavior of the parser to allow you to do so, but so that it only requires a column of taxa (i.e., ages are optional). RevBayes (and BEAST I believe is the same) doesn't actually parse the dates off a starting tree; it gets them from the ages file and the fossil interval file, so decorating the tree with them wouldn't be overly helpful. Polytomies: I added a section to the README to make it obvious that you'll need to resolve polytomies. Since we're not estimating placements from data, I pretty strongly feel that polytomies are the way to export the tree, and I think users should be forced to confront the decision to resolve them if that's what they'd like to do. Formatted clade constraints: I added two functions echo_subtree and echo_rb. Echo subtree simply calls ape's print subtree function to print the subtree to which the tip was added, including the tip. echo_rb created RevBayes formatted clade constraints. I added two tests for these functions. Ants data: I just took this out. It's for the eventual MEE paper, and I think I would like it to be in a separate repo.
I have a couple CRAN check warnings that are both stubborn and cryptic. Can I ask about them?
Thanks so much to all three of you for your help with this; I can already see the package is much improved.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.
April, one of my secret joys in life is figuring out cryptic R CHECK --as-cran warnings. I'm happy to answer questions about them. Being able to quietly pass --as-cran is vital for most R packages.
I'll try to find tomorrow to sit down and plod through my examples again.
On Mon, Oct 8, 2018 at 4:24 PM April Wright notifications@github.com wrote:
I've made the vast majority of the straight-forward changes discussed above in as asked. There are several that bear more discussion, though.
Taxa being added to the root node: This was a bug resulting from getting the parent node of a vector, as opposed to a single taxon. It is now fixed. Necessity of age information in the taxa file: Rachel noted correctly that the ages are never used. This is somewhat of a bias on my part - I wanted it to be possible to just take your RevBayes taxon file, and automatically generate a starting tree from it. I've now amended the behavior of the parser to allow you to do so, but so that it only requires a column of taxa (i.e., ages are optional). RevBayes (and BEAST I believe is the same) doesn't actually parse the dates off a starting tree; it gets them from the ages file and the fossil interval file, so decorating the tree with them wouldn't be overly helpful. Polytomies: I added a section to the README to make it obvious that you'll need to resolve polytomies. Since we're not estimating placements from data, I pretty strongly feel that polytomies are the way to export the tree, and I think users should be forced to confront the decision to resolve them if that's what they'd like to do. Formatted clade constraints: I added two functions echo_subtree and echo_rb. Echo subtree simply calls ape's print subtree function to print the subtree to which the tip was added, including the tip. echo_rb created RevBayes formatted clade constraints. I added two tests for these functions. Ants data: I just took this out. It's for the eventual MEE paper, and I think I would like it to be in a separate repo.
I have a couple CRAN check warnings that are both stubborn and cryptic. Can I ask about them?
Thanks so much to all three of you for your help with this; I can already see the package is much improved.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ropensci/onboarding/issues/239#issuecomment-427983973, or mute the thread https://github.com/notifications/unsubscribe-auth/ABfVL3wNCEHjU6WjLUXbv8pGnNUFoD8vks5ui8KPgaJpZM4Vbw08 .
-- David W. Bapst, PhD Asst Research Professor, Geology & Geophysics, Texas A & M University Postdoc, Ecology & Evolutionary Biology, Univ of Tenn Knoxville https://github.com/dwbapst/paleotree
Hi April, I'm afraid I encountered a number of issue, particularly some functions which broke (and one function that broke if package ape
was loaded).
Here's my R script I was working from with commented out output for you to look at:
devtools::install_github("wrightaprilm/treeStartR")
library(treestartr)
packageVersion("treeStartR")
# > packageVersion("treeStartR")
# [1] ‘0.1.0’
# to build manual:
# devtools:::build_manual("d://dave//workspace//treeStartR")
data(bears)
#########################################################
absent_list <- genera_strippr(tree, bears)
absent_list
# works
# if tax_list isn't a data.frame, it silently fails (returns no error)
# but returns NA as the output
# Here's two examples:
bears2 <- as.vector(bears[,1]) # class character
absent_list2 <- genera_strippr(tree, bears2)
absent_list2
#
tax_list<-bears[,1] # factor
absent_list4 <- genera_strippr(tree, tax_list)
absent_list4
# example output
# > absent_list4 <- genera_strippr(tree, tax_list)
# > absent_list4
# [1] NA
# if the variable isn't named 'taxon' it can't find the taxon names, returns error
bears3 <- bears
names(bears3)[1] <- "taxa"
absent_list3 <- genera_strippr(tree, bears3)
absent_list3
# > absent_list3 <- genera_strippr(tree, bears3)
# Error in `[.data.frame`(tax_list, "taxon") : undefined columns selected
# > absent_list3
# Error: object 'absent_list3' not found
# this requirement needs to be specified in the documentation file
#####################################################
new_tree <- present_tippr(tree, absent_list)
layout(1:2)
plot(tree);plot(new_tree)
# works
# weird error if only a single congener is on the tree:
tree2<-ape::drop.tip(tree,"Indarctos_vireti")
new_tree <- present_tippr(tree2, absent_list)
# output:
# > new_tree <- present_tippr(tree2, absent_list)
# Tree tip names formatted correctly
# [1] "Indarctos_arctoides"
# Error in getParent(tree, loc) : object 'loc' not found
# COULD NOT TRY FOLLOWING AGAIN:
# cannot make two species sister to each other,
# can only place them in a polytomy together
#####################################
new_tree <- absent_tippr(tree, absent_list)
# this function no longer works:
# > new_tree <- absent_tippr(tree, absent_list)
# Error: 'plot' is not an exported object from 'namespace:ape'
# it looks like its looking for ape::plot.phylo but there's
# possibly a reference to ape::plot (which doesn't exist
# plot.phylo is what we call an S3 method - we can call a
# generic function, but R checks if there
# is a more specific function for that class first
# COULD NOT TRY FOLLOWING AGAIN:
# not immediately clear that its asking for me to type in the
# number of the node, rather than interactively click on the plot
# (as in some ape and phytools functions)
# may want to clarify the prompt to users
# COULD NOT TRY FOLLOWING AGAIN:
# Where would you like to place the tip?Zaragocyon_daamsi
#Error in if (where <= length(tree$tip.label) && position == 0) { :
# missing value where TRUE/FALSE needed
#In addition: Warning messages:
#1: In bind.tree(tree, tip, where = where, position = pp) :
# one tree has no branch lengths, they have been ignored
#2: In absent_tippr(tree, absent_list) : NAs introduced by coercion
###############################
new_tree <- rand_absent_tippr(tree, absent_list)
plot(new_tree)
# works, but...
# fatal error if run after loading package ape
#
# > library(ape)
# > new_tree <- rand_absent_tippr(tree, absent_list)
# Adding tips at random node: Ailurarctos_lufengensis
# Adding tips at random node: Kretzoiarctos_beatrix
# Error in if (tree$edge.length[which(tree$edge[, 2] == where)] <= 1e-12) { :
# missing value where TRUE/FALSE needed
################################################
new_tree <- text_placr(tree, mrca_df)
# get this sort of output:
# > new_tree <- text_placr(tree, mrca_df)
# Placing tip Kretzoiarctos_beatrix
# via relatives Indarctos_arctoidesIndarctos_vireti
# at node 25
# Placing tip Ursus_abstrusus
# via relatives Ursus_arctosUrsus_spelaeusUrsus_americanus
# at node 30
# printing the node numbers for usr possibly isn't important (or maybe it is)
# may want to paste in spaces or commas between taxon names for readability
# test if only a single relative is listed:
mrca_df2 <- mrca_df[-1,]
new_tree <- text_placr(tree, mrca_df)
# works fine!
I hope this helps.
if tax_list isn't a data.frame, it silently fails: Fixed - I hadn't considered the situation where someone might not use the dataframe loader function, but might be working separately with the taxa and then call treestartr. So now the genera_strippr
function checks for dataframe, tries to coerce one it it isn't, and if that isn't possible, returns an error.
weird error if only a single congener is on the tree:: This is a bug - I changed a variable name in one spot, but not another.
Error: 'plot' is not an exported object from 'namespace:ape: I tried to solve a CRAN check error and created a namespace collision. It's fixed now.
fatal error if run after loading package ape: I was hilariously mystified by this, but adding in phytools::
before bind.tip
solves it (and I should have done anyway).
I forgot to note this in my last entry: I didn't add having the tips that weren't placed echo to the screen after placing tips. That can be recalculated by calling the genera_strippr
, and I felt like some of the functions were already getting a bit noisy, especially if you do something like print RevBayes clade constraints.
So what I'm really trying to say is I'll probably take you up on CRAN check assistance once the toothpick comes out clean on the rest of it.
Thanks for having a second look. It's so easy to get caught up in how I would use this package and not consider how another person might integrate it into a workflow, particularly in conjunction with more complex data processing or other packages.
Yeah, need to be explicit that you're importing ape::plot.phylo
and phytools::bind.tip
. Sounds like you've fixed the issue. I think maybe ape has its own bind.tip
with slightly different behavior.
Its absolutely not a requirement from your perspective, but I am curious if there is a specific reasoning why ape
isn't a depends - although there is general advice to import packages rather than loading the whole package, most phylogenetics packages in R depend on ape as it is usual for users to be doing phylogenetics-things that require the S3 methods for class phylo
in ape
. Are you envisioning a use case where users are unlikely to have ape
loaded?
I'll try to rerun the code later, after teaching.
Hi April, A lot more works now, but I think there is some oversight about how species are added whether to tips or lower nodes, and that sort of behavior needs to be specified in the documentation, cause I can't figure out how intentional or not this behavior is. Particularly see the worked example at the end of the following, and I'll post the plotted result as well.
devtools::install_github("wrightaprilm/treeStartR")
library(treestartr)
packageVersion("treeStartR")
# > packageVersion("treeStartR")
# [1] ‘0.1.0’
# to build manual:
# devtools:::build_manual("d://dave//workspace//treeStartR")
data(bears)
#########################################################
absent_list <- genera_strippr(tree, bears)
absent_list
# works
# if tax_list isn't a data.frame, it works
# Here's two examples:
bears2 <- as.vector(bears[,1]) # class character
absent_list2 <- genera_strippr(tree, bears2)
absent_list2
# another example
tax_list<-bears[,1] # factor
absent_list4 <- genera_strippr(tree, tax_list)
absent_list4
# if the variable isn't named 'taxon' it can't find
# the taxon names, returns error
bears3 <- bears
names(bears3)[1] <- "taxa"
absent_list3 <- genera_strippr(tree, bears3)
absent_list3
# > absent_list3 <- genera_strippr(tree, bears3)
# Error in `[.data.frame`(tax_list, "taxon") : undefined columns selected
# Some options (choose one, or combine several):
# Use the one column provided if it a one-column data table
# or use the first column as a default?
# but issue an informative warning() saying you are doing so
# Issue an informative error message with stop()
# that it can't find the taxon names as expect
# saying the function expects a certain variable name
# Specify this requirement in the documentation file
#####################################################
new_tree <- present_tippr(tree, absent_list)
layout(1:2)
plot(tree);plot(new_tree)
# works
# drop Indarctos, see if it can place two species as sister
# to each other, or if it makes a polytomy:
tree2<-ape::drop.tip(tree,"Indarctos_vireti")
new_tree <- present_tippr(tree2, absent_list)
layout(1:2)
plot(tree);plot(new_tree)
# it places them in a polytomy instead, making the genus paraphyletic
# is this intentional behavior?
#####################################
new_tree <- absent_tippr(tree, absent_list)
# nice clear interaction :D
# Although not labeled, we can also give tip ID numbers
# and thus add taxa as sister taxa on the tree
# Is this intentional behavior? It isn't described in documentation.
# If this is intentional behavior it should be described in documentation.
###############################
new_tree <- rand_absent_tippr(tree, absent_list)
plot(new_tree)
# works even with ape loaded
################################################
new_tree <- text_placr(tree, mrca_df)
# test if only a single relative is listed:
mrca_df2 <- mrca_df[-1,]
new_tree <- text_placr(tree, mrca_df2)
plot(new_tree)
# Hmm, adds it to a polytomy with both Indarctos species
# Is this intentional behavior?
####################
# constructed example
set.seed(444)
# generate Genus Names
genNames<-paste0("R",
apply(combn(c("a","b","c","d","e"),2),
2,paste0,collapse="")
)
# generate species names
specNames<-apply(combn(c("a","b","c","d"),3),
2,paste0,collapse="")
#
#combine
extraSpec<-as.vector(sapply(genNames,function(x)
paste0(x,"_",specNames)))
tipSpec<-paste0(genNames,"_regularis")
#
#lets make our hypothetical simulated tree of higher taxa
singleSpeciesTree <- rtree(10,tip.label=tipSpec)
#
# format extraSpec correctly
taxonDF<-data.frame(taxon=extraSpec)
#
library(treestartr)
absentList <- genera_strippr(singleSpeciesTree , taxonDF)
absentList
# just first three
newTree <- present_tippr(singleSpeciesTree , absentList[1:3])
# works
# all
newTree <- present_tippr(singleSpeciesTree , absentList)
# works
plot.phylo(newTree, no.margin=TRUE)
# its collapsed all sister genera into each other
# this really doesn't look intentional!
Thanks, again, for all these thorough comments. I really do appreciate it.
# Error in [.data.frame
(tax_list, "taxon") : undefined columns selected: Bug caused by inappropriate curly brace closure.
# Although not labeled, we can also give tip ID numbers and thus add taxa as sister taxa on the tree: Added this to the visual, and suppressed warnings for once sister taxa are added (which gives a polytomy warning).
it places them in a polytomy instead, making the genus paraphyletic # is this intentional behavior? It was not ... and I see how it doesn't make sense. I've changed this so that if there is one congener, the tip is added as sister.
text_placr polytomy: I genuinely would like to hear opinions on this - I like this behavior, but you seem confused by it. Is there a different behavior that would be preferable?
Placements breaking up monophyletic groups: I can see why this one is undesireable. This is fixed by allowing tips to be added as sister, an regenerating the mrca_list afterwards, such that any subsequent tips will be placed into the now-monophyletic grouping. The above now returns the below, which I think is the behavior we want:
Hi April,
I think labeling the tips makes it clear that it is acceptable for users to add taxa as sisters to the tips. I just was pushing on the documentation issue because there weren't tip labels, so it wasn't clear if the behavior was intended or not. (Unintentional behaviors may be relied upon by others, but be unstable because you might not know someone else depends on that functionality, so its best to be clear). I think its clear now, though.
I still have a tiny issue with how genera_strippr
picks what column to use from a data.frame, which I think you could solve just by being more explicit/detailed in the documentation for that function.
Overall, I'd say I don't have really any issue with the package now.
devtools::install_github("wrightaprilm/treeStartR")
library(treestartr)
packageVersion("treeStartR")
# > packageVersion("treeStartR")
# [1] ‘0.1.0’
# to build manual:
# devtools:::build_manual("d://dave//workspace//treeStartR")
data(bears)
#########################################################
absent_list <- genera_strippr(tree, bears)
absent_list
# works
# if tax_list isn't a data.frame, it works
# Here's two examples:
bears2 <- as.vector(bears[,1]) # class character
absent_list2 <- genera_strippr(tree, bears2)
absent_list2
# another example
tax_list<-bears[,1] # factor
absent_list4 <- genera_strippr(tree, tax_list)
absent_list4
# the first row is always used if a data.frame is handed over
bears3 <- bears[,2:1]
#names(bears3)[1] <- "taxa"
absent_list3 <- genera_strippr(tree, bears3)
absent_list3
# > absent_list3 <- genera_strippr(tree, bears3)
#> absent_list3
# [1] "0" "0" "0" "0" "0" "0" "0" "0" "5" "7.7"
#[11] "0.5" "14" "9.6" "5.1" "8.7" "11.7" "36.6" "16.2" "14.4" "4.3"
#[21] "0.06" "21.9"
# I really do think you need to be explicit in the documentation that
# its the first column, or a column with a particular label, or whatever
# the rule is, or say so in a warning() or a message() when
# From before:
#
# Some options (choose one, or combine several):
# Use the one column provided if it a one-column data table
# or use the first column as a default?
# but issue an informative warning() saying you are doing so
# Issue an informative error message with stop()
# that it can't find the taxon names as expect
# saying the function expects a certain variable name
# Specify this requirement in the documentation file
#####################################################
new_tree <- present_tippr(tree, absent_list)
layout(1:2)
plot(tree);plot(new_tree)
# works
# drop 1 Indarctos sp, places other sp
# as sister to the one remaining congeneric
# works!
tree2<-ape::drop.tip(tree,"Indarctos_vireti")
new_tree <- present_tippr(tree2, absent_list)
layout(1:2)
plot(tree, no.margin=TRUE);plot(new_tree,no.margin=TRUE)
#####################################
new_tree <- absent_tippr(tree, absent_list)
# nice clear interaction :D
# we can also give tip ID numbers
# and thus add taxa as sister taxa on the tree
# Documentation doesn't rule out this behavior,
# labeling makes it clear this is probably okay
# Cool
###############################
new_tree <- rand_absent_tippr(tree, absent_list)
plot(new_tree)
# works even with ape loaded
################################################
new_tree <- text_placr(tree, mrca_df)
# test if only a single relative is listed:
mrca_df2 <- mrca_df[-1,]
mrca_df2
new_tree <- text_placr(tree, mrca_df2)
plot(new_tree)
# K_beastrix added as a sister properly
####################
# constructed example
set.seed(444)
# generate Genus Names
genNames<-paste0("R",
apply(combn(c("a","b","c","d","e"),2),
2,paste0,collapse="")
)
# generate species names
specNames<-apply(combn(c("a","b","c","d"),3),
2,paste0,collapse="")
#
#combine
extraSpec<-as.vector(sapply(genNames,function(x)
paste0(x,"_",specNames)))
tipSpec<-paste0(genNames,"_regularis")
#
#lets make our hypothetical simulated tree of higher taxa
singleSpeciesTree <- rtree(10,tip.label=tipSpec)
#
# format extraSpec correctly
taxonDF<-data.frame(taxon=extraSpec)
#
library(treestartr)
absentList <- genera_strippr(singleSpeciesTree , taxonDF)
absentList
# just first three
newTree <- present_tippr(singleSpeciesTree , absentList[1:3])
# works
# all
newTree <- present_tippr(singleSpeciesTree , absentList)
# works
plot.phylo(newTree, no.margin=TRUE)
# its collapsed all sister genera into each other
# this really doesn't look intentional!
OK, added some more information to this point to the function documentation, and a more informative stop() message.
If we're all happy with the major package functions, is it OK if I ask some CRAN check questions?
I feel very silly about this, because this part seems like it should be easy.
No, don't feel silly - R CHECK --as-cran
is often one of the hardest parts because of the opaqueness of the messages.
You need to specify what package you are importing plot from. Is that base plot in graphics
or ape
? You likely need to import plot
from graphics
- its a relatively new requirement.
You have exported objects in the code, or arguments for exported functions that have no documentation assigned to them. You need to search for absent_list
and mrca_df
.
Hmm. I would try deleting your inst
directory. To be honest, I'm not sure why you have one to begin with, and the vignette in it is out of date, I think.
agree, don't feel silly @wrightaprilm ! The cran check stuff is tough and no fun for anyone.
I think you should add inst/docs
to .Rbuildignore
. inst/doc
is a formal directory that is used when building the package, so we don't want to ignore that of course.
I think a problem might be in the line in your DESCRIPTION file where you have VignetteBuilder: utils
- I've not seen utils
in there before. Maybe it is valid, I don't know, but usually the entry there is knitr
In the vignette, you have the line
tax_list <- dataf_parsr("inst/extdata/bears_taxa.tsv")
Instead of the path for that directly, i'd use
tax_list <- dataf_parsr(system.file("extdata/bears_taxa.tsv", package = "treestartr"))
Also, that chunk has eval=FALSE
which means tax_list
is never created, which means the next chunk then can't find tax_list
and the vignette build fails
I'm still a little fuzzy on what it means for an exported object to have attached documentation. For example, mrca_df
is defined as a parameter to text_placr
, and I see it in the manual.
@wrightaprilm as I understand it, every object in your example data file needs to have a listed man page with the same name, or a page where it is listed as an alias. E.g. it is an object in your bears example data, apparently:
You need to list all the components, if those are separate object in the save .Rdata file, as alias via #' @aliases
. For example, see:
https://github.com/bomeara/treevo/blob/master/R/simRunExample.R#L11
As an addendum, it’s usually good practice to name objects in an example data file something different than the exact name of function arguments so we know which one R is complaining about.
Cheers, -Dave
On Fri, Oct 19, 2018 at 2:38 PM April Wright notifications@github.com wrote:
I'm still a little fuzzy on what it means for an exported object to have attached documentation. For example, mrca_df is defined as a parameter to text_placr, and I see it in the manual.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ropensci/onboarding/issues/239#issuecomment-431475882, or mute the thread https://github.com/notifications/unsubscribe-auth/ABfVLyVOVoTH2sICPLkR0qT0EQG2aJhvks5umiotgaJpZM4Vbw08 .
-- David W. Bapst, PhD Asst Research Professor, Geology & Geophysics, Texas A & M University Postdoc, Ecology & Evolutionary Biology, Univ of Tenn Knoxville https://github.com/dwbapst/paleotree
I think I may have done it - all NOTEs, ERRORs and WARNINGs silenced, and build is passing.
Now what?
Nice work @wrightaprilm I see a few more things:
.Rbuildignore
file: README.Rmd, README.html, bibliography.bib, deprecated, docs, images, paper, refs.bibchecking for missing documentation entries (1s)
Undocumented code objects:
‘absent_tippr’ ‘genera_strippr’ ‘present_tippr
There's some remaining issues from my comment above https://github.com/ropensci/onboarding/issues/239#issuecomment-430371029 Did you see it yet?
Summary
What does this package do? (explain in 50 words or less): This package helps users to build a reasonable starting phylogeny for phylogenetic analysis. In particular, this package allows researchers to add tips to a tree through either a graphical user interface, an automated system based on taxonomy, or by inputting a dataframe of taxa to be placed.
Paste the full DESCRIPTION file inside a code block below:
URL for the package (the development repository, not a stylized html page): https://github.com/wrightaprilm/treeStartR
Please indicate which category or categories from our package fit policies this package falls under *and why(? (e.g., data retrieval, reproducibility. If you are unsure, we suggest you make a pre-submission inquiry.):
[e.g., "data extraction, because the package parses a scientific data file format"]
data munging, combining data sources for use with phylogenetic analysis programs (in particular RevBayes & Mr. Bayes)
As the complexity of phylogenetic models increases, it can be difficult to find starting phylogenetic trees for Bayesian MCMC that have computable likelihoods. Most programs start from a random addition tree, which, for large datasets, may not produce reasonable likelihoods. Most software allows users to input a starting tree. The starting tree is often gleaned from the literature. However, for people performing "total evidence" or "joint" phylogenetic analysis, there may not be a published tree with all the relevant taxa. This package allows users to easily create one. Fossil record taxa are commonly left off phylogenies, so I expect biologists and paleontologists working with fossils to be the chief users of this package.
Not to my knowledge.
Emailed @sckott to discuss how to know if the package is ready, and if reviewers can help with a WARNING, but did not discuss specific scope questions.
Requirements
Confirm each of the following by checking the box. This package:
Publication options
paper.md
matching JOSS's requirements with a high-level description in the package root or ininst/
.Detail
R CMD check
(ordevtools::check()
) succeed? Paste and describe any errors or warnings: I have two, and I'm unsure what they mean:Undocumented code objects: ‘absent_list’ ‘ant_mrca_df’ ‘ant_taxa’ ‘ant_tree’ ‘mm’ ‘mol’ ‘mrca_df’ ‘new_tree’ ‘tax_file’ ‘tax_list’ ‘tree’ Files in the 'vignettes' directory but no files in 'inst/doc': ‘vignette.Rmd’, ‘vignette.pdf’
[ ] Does the package conform to rOpenSci packaging guidelines? Please describe any exceptions:
If this is a resubmission following rejection, please explain the change in circumstances:
If possible, please provide recommendations of reviewers - those with experience with similar packages and/or likely users of your package - and their GitHub user names:
Klaus Schliep - KlausVigo David Bapst - dwbapst Rachel Warnock - rachelwarnock