submission - treestartr

ropensci / software-review

rOpenSci Software Peer Review.

291 stars 104 forks source link

submission - treestartr #239

Closed wrightaprilm closed 5 years ago

wrightaprilm commented 6 years ago

Summary

What does this package do? (explain in 50 words or less): This package helps users to build a reasonable starting phylogeny for phylogenetic analysis. In particular, this package allows researchers to add tips to a tree through either a graphical user interface, an automated system based on taxonomy, or by inputting a dataframe of taxa to be placed.
Paste the full DESCRIPTION file inside a code block below:

Package: treestartr
Type: Package
Title: Generate Starting Trees For Combined Molecular, Morphological and Stratigraphic Data
Version: 0.1.0
Authors@R: person("April", "Wright", role = c("aut", "cre"),
    email = "april.wright@selu.edu")
Description: Combine a list of taxa with a phylogeny to generate a starting tree for use in
    total evidence dating analyses.
URL: https://wrightaprilm.github.io/treeStartR/
BugReports: https://github.com/wrightaprilm/treeStartR/issues
License: MIT + file LICENSE
Depends:
    R (>= 3.0.0)
Imports:
    phytools (>= 0.6-4),
    ape
Suggests:
    knitr,
    testthat,
    covr,
    utils
VignetteBuilder: utils
Encoding: UTF-8
LazyData: true
RoxygenNote: 6.0.1

URL for the package (the development repository, not a stylized html page): https://github.com/wrightaprilm/treeStartR
Please indicate which category or categories from our package fit policies this package falls under *and why(? (e.g., data retrieval, reproducibility. If you are unsure, we suggest you make a pre-submission inquiry.):

[e.g., "data extraction, because the package parses a scientific data file format"]

data munging, combining data sources for use with phylogenetic analysis programs (in particular RevBayes & Mr. Bayes)

Who is the target audience and what are scientific applications of this package?

As the complexity of phylogenetic models increases, it can be difficult to find starting phylogenetic trees for Bayesian MCMC that have computable likelihoods. Most programs start from a random addition tree, which, for large datasets, may not produce reasonable likelihoods. Most software allows users to input a starting tree. The starting tree is often gleaned from the literature. However, for people performing "total evidence" or "joint" phylogenetic analysis, there may not be a published tree with all the relevant taxa. This package allows users to easily create one. Fossil record taxa are commonly left off phylogenies, so I expect biologists and paleontologists working with fossils to be the chief users of this package.

Are there other R packages that accomplish the same thing? If so, how does yours differ or meet our criteria for best-in-category?

Not to my knowledge.

If you made a pre-submission enquiry, please paste the link to the corresponding issue, forum post, or other discussion, or @tag the editor you contacted.

Emailed @sckott to discuss how to know if the package is ready, and if reviewers can help with a WARNING, but did not discuss specific scope questions.

Requirements

Confirm each of the following by checking the box. This package:

[x ] does not violate the Terms of Service of any service it interacts with.
[x] has a CRAN and OSI accepted license.
[x] contains a README with instructions for installing the development version.
[x] includes documentation with examples for all functions.
[x] contains a vignette with examples of its essential functions and uses.
[x] has a test suite.
[x] has continuous integration, including reporting of test coverage, using services such as Travis CI, Coveralls and/or CodeCov.
[x] I agree to abide by ROpenSci's Code of Conduct during the review process and in maintaining my package should it be accepted.

Publication options

[x] Do you intend for this package to go on CRAN?
[ ] Do you wish to automatically submit to the Journal of Open Source Software? If so:
- [ ] The package has an obvious research application according to JOSS's definition.
- [ ] The package contains a paper.md matching JOSS's requirements with a high-level description in the package root or in inst/.
- [ ] The package is deposited in a long-term repository with the DOI:
- (Do not submit your package separately to JOSS)
[x] Do you wish to submit an Applications Article about your package to Methods in Ecology and Evolution? If so:
- [x] The package is novel and will be of interest to the broad readership of the journal.
- [x] The manuscript describing the package is no longer than 3000 words.
- [x] You intend to archive the code for the package in a long-term repository which meets the requirements of the journal (see MEE's Policy on Publishing Code)
- (Scope: Do consider MEE's Aims and Scope for your manuscript. We make no gaurantee that your manuscript willl be within MEE scope.)
- (Although not required, we strongly recommend having a full manuscript prepared when you submit here.)
- (Please do not submit your package separately to Methods in Ecology and Evolution)

Detail

[ ] Does R CMD check (or devtools::check()) succeed? Paste and describe any errors or warnings: I have two, and I'm unsure what they mean:

Undocumented code objects: ‘absent_list’ ‘ant_mrca_df’ ‘ant_taxa’ ‘ant_tree’ ‘mm’ ‘mol’ ‘mrca_df’ ‘new_tree’ ‘tax_file’ ‘tax_list’ ‘tree’ Files in the 'vignettes' directory but no files in 'inst/doc': ‘vignette.Rmd’, ‘vignette.pdf’

[ ] Does the package conform to rOpenSci packaging guidelines? Please describe any exceptions:
If this is a resubmission following rejection, please explain the change in circumstances:
If possible, please provide recommendations of reviewers - those with experience with similar packages and/or likely users of your package - and their GitHub user names:

Klaus Schliep - KlausVigo David Bapst - dwbapst Rachel Warnock - rachelwarnock

sckott commented 6 years ago

thanks for your submission @wrightaprilm !

Doing editor checks right now. On trying to install I get

➜  treeStartR git:(master) Rscript -e 'library(devtools); document(); install()'
Updating treestartr documentation
Loading treestartr
Error: 'ants' is not an exported object from 'namespace:treestartr'

can you fix? then i'll continue with editor checks

wrightaprilm commented 6 years ago

I misunderstood some stuff about data packaging. But I think it's working now!

wrightaprilm commented 6 years ago

Quick question - a user noted a (both run time and unit) test I should add for their use case, and I have added it. Should I push that now, or wait?

sckott commented 6 years ago

Yes, go ahead! Let me know when that's pushed

wrightaprilm commented 6 years ago

Alright, pushed. Thanks!

sckott commented 6 years ago

thanks!, will have another look

sckott commented 6 years ago

Editor checks:

[x] Fit: The package meets criteria for fit and overlap
[x] Automated tests: Package has a testing suite and is tested via Travis-CI or another CI service.
[x] License: The package has a CRAN or OSI accepted license
[x] Repository: The repository link resolves correctly
[ ] Archive (JOSS only, may be post-review): The repository DOI resolves correctly
[ ] Version (JOSS only, may be post-review): Does the release version given match the GitHub release (v1.0.0)?

Editor comments

Thanks for your submission @wrightaprilm !

Here's the output from goodpractice. If you haven't used goodpractice it's an R package that checks a number of things with another package - most of which we agree with and want authors to follow. You don't need to address these now, it's info for reviewers to use to get started.

── GP treestartr ─────────
It is good practice to

  ✖ write unit tests for all functions, and all package code in general. 76% of code lines are covered by test cases.

    R/absent_tippR.R:15:NA
    R/absent_tippR.R:16:NA
    R/absent_tippR.R:19:NA
    R/absent_tippR.R:21:NA
    R/absent_tippR.R:22:NA
    ... and 28 more lines

  ✖ avoid long code lines, it is bad for readability. Also, many people prefer editor windows that are about 80 characters wide. Try make your lines
    shorter than 80 characters

    R/taxon_testr.R:11:1
    R/tree_df.R:20:1

  ✖ avoid sapply(), it is not type safe. It might return a vector, or a list, depending on the input data. Consider using vapply() instead.

    R/make_absent_df.R:14:23
    R/tree_df.R:24:21

  ✖ not import packages as a whole, as this can cause name clashes between the imported packages. Instead, import only the specific functions you need.
  ✖ fix this R CMD check NOTE: absent_tippr: no visible global function definition for ‘plot’ dataf_parsr: no visible global function definition for
    ‘read.csv’ test_dataf: no visible global function definition for ‘assert_that’ Undefined global functions or variables: assert_that plot read.csv Consider
    adding importFrom("graphics", "plot") importFrom("utils", "read.csv") to your NAMESPACE file.
  ✖ fix this R CMD check WARNING: Undocumented code objects: ‘absent_list’ ‘absent_tippr’ ‘dataf_parsr’ ‘genera_strippr’ ‘mrca_df’ ‘present_tippr’
    ‘rand_absent_tippr’ ‘text_placr’ ‘tree’ Undocumented data sets: ‘absent_list’ ‘mrca_df’ ‘tree’ All user-level objects in a package should have documentation
    entries. See chapter ‘Writing R documentation files’ in the ‘Writing R Extensions’ manual.
  ✖ fix this R CMD check WARNING: Data with usage in documentation object 'ants' but not in code: ants

Seeking reviewers now 🕐

Reviewers:

@dwbapst due date 2018-08-27
@rachelwarnock due ddate 2018-08-31

sckott commented 6 years ago

Reviewers:

@dwbapst due date 2018-08-27
@rachelwarnock due ddate 2018-08-31

sckott commented 6 years ago

@dwbapst @rachelwarnock Will you be able to get your review in soon? We're about 2 weeks past due date. Thanks!

dwbapst commented 6 years ago

Sorry, Scott, I sent you some emails a few weeks ago, which I later got weird bounces back from - they were about what a review looks like. I've got a heap of QA issues, but not sure what format I should put these into, or what level of code analysis you want me to do? -Dave

On Wed, Sep 12, 2018 at 2:30 PM Scott Chamberlain notifications@github.com wrote:

@dwbapst https://github.com/dwbapst @rachelwarnock https://github.com/rachelwarnock Will you be able to get your review in soon? We're about 2 weeks past due date. Thanks!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ropensci/onboarding/issues/239#issuecomment-420769770, or mute the thread https://github.com/notifications/unsubscribe-auth/ABfVL1X1ENDBW9RtK_ZdZDPSsQJxTqFSks5uaWDzgaJpZM4Vbw08 .

-- David W. Bapst, PhD Asst Research Professor, Geology & Geophysics, Texas A & M University Postdoc, Ecology & Evolutionary Biology, Univ of Tenn Knoxville https://github.com/dwbapst/paleotree

sckott commented 6 years ago

Can you send that email again to myrmecocystus@gmail.com

dwbapst commented 6 years ago

No problem - I'm sorry that I didn't investigate after the bounces, but I think they came just as classes were beginning... -Dave

On Wed, Sep 12, 2018 at 2:34 PM Scott Chamberlain notifications@github.com wrote:

Can you send that email again to myrmecocystus@gmail.com

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

-- David W. Bapst, PhD Asst Research Professor, Geology & Geophysics, Texas A & M University Postdoc, Ecology & Evolutionary Biology, Univ of Tenn Knoxville https://github.com/dwbapst/paleotree

rachelwarnock commented 5 years ago

Hi Scott,

I’m so sorry about the delay. I’ll try to wrap my comments & review up by the end of next week.

Best wishes,

Rachel

On Wed, 12 Sep 2018 at 21:30, Scott Chamberlain notifications@github.com wrote:

@dwbapst https://github.com/dwbapst @rachelwarnock https://github.com/rachelwarnock Will you be able to get your review in soon? We're about 2 weeks past due date. Thanks!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ropensci/onboarding/issues/239#issuecomment-420769770, or mute the thread https://github.com/notifications/unsubscribe-auth/ACYzn0kgE644kHXESp5K7_d0rdAOjIVAks5uaWDzgaJpZM4Vbw08 .

-- Rachel.Warnock@gmail.com

dwbapst commented 5 years ago

Hi, I feel the same as Rachel. I'm new to rOpenSci reviewing and it just came at a bad time when I was teaching two graduate classes. In order to force myself to work on this, I'm going to work the draft live here in the issue thread.

Package Review

[x] As the reviewer I confirm that there are no conflicts of interest for me to review this work (If you are unsure whether you are in conflict, please speak to your editor before starting your review).
- Before getting the request, Ih ad already read the manuscript for this software, and promised I'd eventually look at the software, so this is what I was going to do anyway.

Documentation

The package includes all the following forms of documentation:

[x] A statement of need clearly stating problems the software is designed to solve and its target audience in README
- yeah this exists
[x] Installation instructions: for the development version of package and any non-standard dependencies in README
[x] Vignette(s) demonstrating major functionality that runs successfully locally
- the vignette is basically identical to the top-level readme, which does seem to be demonstrating major functionality, yes
[x] Function Documentation: for all exported functions in R help
- yes... dunno why you're asking this as R CHECK would complain about this if it weren't true
[?] Examples for all exported functions in R Help that run successfully locally
- Some are not run, e.g. absent_tippr.
[?] Community guidelines including contribution guidelines in the README or CONTRIBUTING, and DESCRIPTION with URL, BugReports and Maintainer (which may be autogenerated via Authors@R). -Most of this is here, but I don't see anything on contributing to the project - I don't usually include a section on that in packages myself, so I am not sure what ropensci wants here.

Functionality

[x] Installation: Installation succeeds as documented. -Yeah, it installed. I would point out though that R, silly enough, is a case-sensitive language, and so although the package is referred to at times as 'treeStartR', it can only be installed or loaded into the library as 'treestartr'. Please correct all case uses. (I realize that git, annoyingly, isn't case sensitive, so nothing can be done about the repo name.)
[x] Functionality: Any functional claims of the software been confirmed. -I would say the software ran on the provided datasets, but more noisly (with more random messages, warnings, etc) than are needed and to the detriment of the software.
[x] Performance: Any performance claims of the software been confirmed.
- Not sure this package is making any performance claim, other than its easier than constructing this sort of a data structure by hand. (Which I've done. Yuck.)
[ ] Automated tests: Unit tests cover essential functions of the package and a reasonable range of inputs and conditions. All tests pass on the local machine. --There are tests, and they are reasonable tests, but they fail on my local machine.

tools::test("d://dave//workspace//treeStartR")
Loading treestartr
Testing treestartr
√ | OK F W S | Context
Error in x[[method]](...) : attempt to apply non-function

== Results ===========================================================================
OK:       0
Failed:   4
Warnings: 1
Skipped:  0

Actually... I am beginning to wonder if there is something broken with my local installation of testthat. Hmm. Maybe disregard this comment for now if you can't reproduce it.

Anyway, this led me to running a local R CHECK --as-CRAN, where, oddly, the tests pass. It did produce some warnings (only one of which Scott notes above, that I'd like to comment on how to fix, as you indicate you would like to publish this on CRAN.

* checking top-level files ... NOTE
Non-standard files/directories found at top level:
  'README.Rmd' 'README.html' 'bibliography.bib' 'deprecated' 'docs'
  'images' 'paper' 'refs.bib'

These should be added to your .Rbuildignore file.

* checking R code for possible problems ... NOTE
absent_tippr: no visible global function definition for 'plot'
dataf_parsr: no visible global function definition for 'read.csv'
test_dataf: no visible global function definition for 'assert_that'
Undefined global functions or variables:
  assert_that plot read.csv
Consider adding
  importFrom("graphics", "plot")
  importFrom("utils", "read.csv")
to your NAMESPACE file.

CRAN policy increasingly wants us, when we use functions readily available in packages included in a base installation (but not in the base package) to directly list these as imports, and indicate that dependency in our DESCRIPTION file. Without looking though, I think the plot it is referring to here is probably ape's plot.phylo for plotting a phylogeny. I think best practice is that you should indicate this is ape::plot.phylo when you use that function, even in examples (I haven't done this, but am trying to).

* checking for missing documentation entries ... WARNING
Undocumented code objects:
  'absent_list' 'mrca_df' 'tree'
Undocumented data sets:
  'absent_list' 'mrca_df' 'tree'
All user-level objects in a package should have documentation entries.
See chapter 'Writing R documentation files' in the 'Writing R
Extensions' manual.

Obviously these need to be covered. I think they are objects in the bears dataset - all objects within data files need to be listed as aliases, e.g. in roxygen2 you can do something like #' @aliases RaiaCopesRule ceratopsianTreeRaia cervidTreeRaia to indicate all the objects and their associated help file.

* checking for code/documentation mismatches ... WARNING
Data with usage in documentation object 'ants' but not in code:
  ants

You need an example that makes use of the ant data - I don't usually get this warning, but my interpretation is that CRAN doesn't like packages with example datasets that aren't used - there was an issue a few years ago of people using CRAN as basically a repository for datasets for classes/textbooks, which isn't its intended use (and meant very large packages, size-wise, with almost no code base).

[ ] Packaging guidelines: The package conforms to the rOpenSci packaging guidelines
- @sckott it would be great if there was a succinct list somewhere for this. Or just add it to this template, which is pretty great already! This is really silly, if rOpenSci has a great big list of 'packaging guidelines' then it should be part of this checklist, not its own, lone item.
- ANYWAY, it seems to be a great big list, the first few being if the package and function names are okay. I guess so. I'm not certain what I'd think a priori what a package named treeStartR was, but I think the functionality in this package here is a reasonable association.
- You don't seem to have a NEWS or CHANGES or CHANGELOG file.
- :shrug:

Final approval (post-review)

[ ] The author has responded to my review and made changes to my satisfaction. I recommend approving this package.

Estimated hours spent reviewing:

Review Comments

Review Suggestions from rOpenSci:
    Does the code comply with general principles in the Mozilla reviewing guide?
    Does the package comply with the rOpenSci packaging guide?
    Are there improvements that could be made to the code style?
    Is there code duplication in the package that should be reduced?
    Are there user interface improvements that could be made?
    Are there performance improvements that could be made?
    Is the documentation (installation instructions/vignettes/examples/demos) clear and sufficient?
    If you have your own relevant data/problem, work through it with the package. You may find rough edges and use-cases the author didn’t think about.
    Checking the package’s logs on its continuous integration services (Travis-CI, Codecov, etc.)
    Running devtools::check() and devtools::test() on the package to find any errors that may be missed on the author’s system.
    Using the covr package to examine the extent of test coverage.
    Using the goodpractice package (goodpractice::gp()) to identify likely sources of errors and style issues. Most exceptions will need to be justified by the author in the particular context of their package.
    Using spelling::spell_check_package() (and spelling::spell_check_files("README.Rmd")) to find spelling errors.

So here's some random comments from playing with the package.

First, I strongly making a manual page named treeStartR or with that alias so that users who cannot remember a function name but can remember the package name can quickly access the documentation. This page also serves as a good starting point for users interested in the package, and you could put a lot of information from your readme there, such as directing them to the vignette.

absent_list <- genera_strippr(tree, bears)

This function expects a data.frame and not any other sort of object - you should be explicit about this in the help file. It also expets that the variable containing the taxon names will be labeled taxon and nothing else. This should also be laid out, possibly even allow users to vary it by adding an argument like `taxonNamesVariable = "taxon"' so the default behavior stays the same.

The manual page for this function refers to 'morphological & stratigraphic genera'. Its not clear to what the distinction is between these.

new_tree <- present_tippr(tree, absent_list)

The function present_tippr is very noisy. Why is the message Tree tip names formatted correctly repeated so many times? The message is not very informative to the user. The same issue occurs with function absent_tippr.

More messages:

Adding tips with congeners on tree:Indarctos_punjabiensis
Adding tip via MRCA at 25
Adding tips with congeners on tree:Ursus_abstrusus
Adding tip via MRCA at 30

What are these number codes? Are they node numbers 25 and 30? I'd clarify the message to the console, or clarify the documentation, or both.

Another message, this time a warning:

Warning message:
 In bind.tree(tree, tip, where = where, position = pp) :
  one tree has no branch lengths, they have been ignored

I would suggest suppressing this expected, non-informative warning from bind.tree by masking it with function suppressWarnings. This also happens with functions absent_tippr and text_placr.

More useful information to report to the user, perhaps, would be the number of OTUs added and the number of OTUs not added, perhaps also giving user a taxon list of the still-absent OTUs, to use with the other taxon-placing methods.

Also, if only a single congener is on the tree, the the species gets added, but to the root node:

tree2<-drop.tip(tree,"Indarctos_vireti")
new_tree <- present_tippr(tree2, absent_list)

We can compare the two trees with:

layout(1:2)
plot(tree);plot(new_tree)

I don't this this behavior is desired, and even if it is (sometimes) desired in certain use-cases, it should be described in the manual and perhaps toggle-able with an argument.

new_tree <- absent_tippr(tree, absent_list)

It is not immediately clear when you use this function that the prompt is asking for me to type in the number of the node, rather than interactively click on the plot (as in some ape and phytools functions). You may want to clarify the prompt to users so they know they need to enter the number.

new_tree <- rand_absent_tippr(tree, absent_list)

Where would you like to place the tip?Zaragocyon_daamsi
Error in if (where <= length(tree$tip.label) && position == 0) { : 
  missing value where TRUE/FALSE needed
In addition: Warning messages:
1: In bind.tree(tree, tip, where = where, position = pp) :
  one tree has no branch lengths, they have been ignored
2: In absent_tippr(tree, absent_list) : NAs introduced by coercion

In the above code example, apparently you cannot make two species sister to each other, You can only place them in a polytomy together with a third taxon. This seems like it might be an unintentional design oversight.

If only a single relative is listed with text_placr, you get something like

new_tree <- text_placr(tree, mrca_df)
 Placing tip Kretzoiarctos_beatrix
 via relatives Indarctos_arctoides
 at node

...And then the taxon gets added to the root node, bizarrely. Nothing in the documentation about more than one relative needing to be specified, or of what the algorithm's behavior is when only a single relative is specified. Again, this means you cannot make add a taxon as sister to another taxon, and it seems undesirable for users.

I tried generating an example dataset to see what happened.

library(paleotree)

set.seed(444)
# generate Genus Names
genNames<-paste0("R",
    apply(combn(c("a","b","c","d","e"),2),
        2,paste0,collapse="")
    )
# generate species names
specNames<-apply(combn(c("a","b","c","d"),3),
        2,paste0,collapse="")
#
#combine
extraSpec<-as.vector(sapply(genNames,function(x) 
    paste0(x,"_",specNames)))
tipSpec<-paste0(genNames,"_regularis")
#
#lets make our hypothetical simulated tree of higher taxa
singleSpeciesTree <- ape::rtree(10, tip.label=tipSpec)
#
# format extraSpec correctly
taxonDF<-data.frame(taxon=extraSpec)
#
library(treestartr)
absentList <- genera_strippr(singleSpeciesTree , taxonDF)

In the above, genera_strippr seems to work, although I note that it prints the entire absentList to the console, which is overly noisy.

sckott commented 5 years ago

thanks @dwbapst

sckott commented 5 years ago

@rachelwarnock okay, if it will be longer than that please let me know so I can get someone else

dwbapst commented 5 years ago

Okay @sckott @wrightaprilm I think my review is as complete as its going to be for now.

wrightaprilm commented 5 years ago

Thanks for that, @dwbapst. Really helpful stuff here (and thankfully all addressable)! @sckott, presumably I wait for @rwarnock, in case any of her review conflicts with Dave's. But after her's comes in, it looks like in the review guidelines, I can just start work on revision without waiting on an "Editor decision" in the way of a more traditional journal?

[If I'm wrong about the above, I'm certainly happy to contribute different wording to policies.html]

sckott commented 5 years ago

@wrightaprilm correct, let's wait for @rachelwarnock , then do revisions. correct, you don't need to wait for an editor decision

rachelwarnock commented 5 years ago

Package Review

[X] As the reviewer I confirm that there are no conflicts of interest for me to review this work.

Documentation

The package includes all the following forms of documentation:

[X] A statement of need clearly stating problems the software is designed to solve and its target audience in README
[X] Installation instructions: for the development version of package and any non-standard dependencies in README
[X] Vignette(s) demonstrating major functionality that runs successfully locally
[X] Function Documentation: for all exported functions in R help
[X] Examples for all exported functions in R Help that run successfully locally
[X] Community guidelines including contribution guidelines in the README or CONTRIBUTING, and DESCRIPTION with URL, BugReports and Maintainer (which may be autogenerated via Authors@R).
- it doesn't say how to contribute but I'm not sure this is necessary

Functionality

[X] Installation: Installation succeeds as documented.
[X] Functionality: Any functional claims of the software been confirmed.
[X] Performance: Any performance claims of the software been confirmed.
[ ] Automated tests: Unit tests cover essential functions of the package. and a reasonable range of inputs and conditions. All tests pass on the local machine.
- I'm not sure the tests do run; see my comment towards the end about running devtools::check
[X] Packaging guidelines: The package conforms to the rOpenSci packaging guidelines

Final approval (post-review)

[ ] The author has responded to my review and made changes to my satisfaction. I recommend approving this package.

Estimated hours spent reviewing:

Review Comments

Comments based on the README

Love the logo!

The main functionality of the package could be stated earlier on in the README. Its not until the end of third paragraph that you learn the main purpose of the package. You could include a sentence at the beginning, saying something like, "This package produces starting trees incorporating fossils for phylogenetic analysis", before jumping into your nice intro. Or the title of the README could be "treeStartR: generating starting trees for phylogenetic analysis".

The package depends on the package phytools, but since you use the phytools:: and ape:: function calling format (which is great programming practice) you don't have to load the phytools library on start up for the treestartr functions to work (I tested to make sure).

Its useful to have the function names within text using the markdown coding format, i.e. dataf_parser instead of dataf_parser. That way, the functions are highlighted if you want to quickly scan the README to locate info.

I don't think you need to use data(bears) since this dataset is already loaded into the environment when you load the package. data(ants) returns "data set 'ants' not found", although this dataset has a help page.

With the bears dataset, the absent_list, mrca_df and tree components of the data are available, but not tax_list. Instead, the info that should be represented by this variable appears to represented by the bears variable. i.e. the vector output by genera_strippr(tree, bears) matches absent_list. This doesn't seem intentional.

With new_tree <- present_tippr(tree, absent_list), I don't really follow the screen output from this command. Why is "Tree tip names formatted correctly" printed to screen so many times (= 33)? length(absent_list) + length(tree$tip.label) = 22 only. With a very large dataset you could end up generating a lot of unneccessary screen output. Why not just print "all tip names formatted correctly" once after checking? or just return incorrectly formatted names. This also applies to other functions that generate this output (absent_tippr, rand_absent_tippr, get_lost, get_found).

Unclear what the source of this warning message means: Warning message: In bind.tree(tree, tip, where = where, position = pp) : one tree has no branch lengths, they have been ignored

I suppose this has something to do with the fact that the functions are returning trees with no branch lengths. This suggests that the fossil ages are not being taken into account. Will this not be important for generating starting trees for fbd/tip-dating analyses?

The manual option is tricky for trees even as large as the example dataset. For example, on my small laptop screen, some of the branching events are too close for me to be able to distinguish the node labels (i.e. many are plotted on top of each other). Not sure if there's anything you can do about this though.

Where it says TSV file, do you mean CSV? or CSV/TSV

Perhaps it would be useful to have an example, where multiple options are used to add taxa to the tree.

Comments based on the help pages

bears - tax\_list not available ants - has a help page but I can't seem to load the data

absent_tippr - the documentation say tips are added "via user input" but its not clear that this is function is interactive.

dataf_parsr - The function description mentions morphological and molecular data but I think this function is designed to read the file containing the stratigraphic info only. Maybe include a url to the relevant RB tutorial? Where it describes the function argument dataf, I think this should be "file name" not "data frame" (which is the output). In addition, because the functions don't return time scaled trees, I'm sure why the fossil ages are required.

genera_strippr - no issues here.

get_found, get_lost - The distinction between these functions is a bit confusing. Could the dataframe output by this function have more useful column headers? Or perhaps these could be private internal functions.

make_absentdf - Perhaps this could also be a private internal function.

present_tippr - I would add the more comprehensive description of this function from the README here.

rand_absent_tippr - no issues here.

text_placr - maybe some additional clarification would be useful here. Am I right in thinking the clade column really refers to a branch/tip? And that the function then builds a clade from the tips (e.g. Ursus_arctos, Ursus_spelaeus, Ursus_americanus) from which it works out where to place the taxon (e.g. Ursus_abstrusus). At first when I looked at mrca_df I wasn't clear why there were multiple entries per taxon in the table.

I think it would be useful to have more complete examples in the documentation. e.g. for rand_absent_tippr.

data(bears)
absent_list <- genera_strippr(tree, tax_list)
new_tree <- rand_absent_tippr(tree, absent_list)
write.nexus(new_tree)

That way, if the user wanted to use this function, they would know which other function help pages to refer to get all the info necessary to generate input.

On a related note, I initially found the relationship between the package functions hard to get my head around, while just scrolling through the help pages. I produced the following for reference.

dataf_parsr (load a total list of taxa) ->
genera_strippr (find out which taxa are already on the tree) -> 
absent_tippr (add tips manually) / 
present_tippr (add tips with congeners)/ 
rand_absent_tippr (add tips randomly) / 
text_placr (add tips using csv) -> 
write.nexus

Perhaps you could consider including a similar flow chart or table in the README. Definitely not necessary, esp. if you included more start-to-finish examples, so its just a thought!

Comments based on the code

My comments on the code are extremely minor. The functions all appear to be concise and the more complicated functions are nicely commented. I can't see any obvious room for improvement in terms of efficiency or redundancy, apart from one small thing noted below.

I'm not sure each function needs to be in a separate file (i.e. related functions could potentially live in the same file) but this is down to programmer's preference and since the package isn't massive it doesn't really matter.

genera_strippr - why does this function print and return absent?

Other comments

Note that running devtools::check() produces 1 error, 4 warnings and 3 notes. Might be good to double check these issues. devtools::spell_check() identified one misspelling of phylogeny.

I simulated a dataset to explore the functions a bit more. Everything seems to work well but I noted the following:

it seems to be the case that the fossil ages are not needed in dataf (i.e. the age column can be absent and the functions still work).
when the fossil ages are included, I can't see that the functions use this info.
I couldn't reproduce the errors observed with the bears dataset. i.e. the functions output trees with branch lengths and don't return a warning about this.
I just noticed the functions return non-binary trees. Unsure if this is a problem.

If the functions could return time scaled starting trees that respect the fossil ages I think this would be extremely useful.

If I had to pick a favourite function it would be text_placr. This is super useful :)

Note, I didn't look at Dave's review before writing this, but I can see some overlap. Hope this helps!

sckott commented 5 years ago

@dwbapst do you have a rough estimat of hrs spent doing your review? we like to keep track for use in a long term goal of using our reviewers time most effectively

sckott commented 5 years ago

@wrightaprilm All reviews in now. thanks much to @rachelwarnock and @dwbapst !

dwbapst commented 5 years ago

@sckott Honestly? It probably took more than 8 hours, maybe 10 hours, spread over a month? This isn't an atypical amount of time for to spend on reviewing, particularly software - what I did here is precisely I have done for reviewing papers describing new R packages at MEE. Some reviews have absolutely taken more than 10 hours, and many of my non-software reviews take longer as well.

dwbapst commented 5 years ago

@rachelwarnock I think that the functions can return non-binary trees is intentional, and sort of the point.

sckott commented 5 years ago

@dwbapst yes, honesty please. thanks very much.

wrightaprilm commented 5 years ago

Thanks so much to both of you, @dwbapst and @rachelwarnock, and to you, @sckott for facilitating. I'll probably get started on response to reviews (i.e., just fixing things) Friday.

rachelwarnock commented 5 years ago

@rachelwarnock I think that the functions can return non-binary trees is intentional, and sort of the point.

@dwbapst can you really specify unresolved starting trees in revbayes and beast2?

dwbapst commented 5 years ago

@rachelwarnock Yup, that was a dumb comment of mine. I don't know specifically what the starting tree requirements are, but you are probably right that they probably can't be non-binary.

I think I was implicitly imagining that you'd add OTUs to the initial tree, making a big informal taxon-tree sort of thing, but then for starting purposes, you'd want to randomly resolve those, and probably get some rough starting dates for the tree if for tip-dating analyses, as you talk about in your review above. I recall Nick Matzke using paleotree for that step to get starting trees for the BEAST2 analyses in the Archeopteryx paper that April and I are on. But those are steps that treestartr could help automate too.

rachelwarnock commented 5 years ago

@dwbapst @wrightaprilm I asked one of the BEAST2 developers (Tim Vaughan) if you can specify unresolved starting trees and he said: "Yes you can! This will just work. Read it in as a newick string with a polytomy. This will be automatically converted to a binary tree with a zero edge length by beast’s newick parser, and the operators will then quickly replace the zero-length edge with one of finite-length." Maybe worth clarifying in the package documentation! I presume this is also true in RB.

dwbapst commented 5 years ago

@rachelwarnock Yeah, I can't find anything about starting tree requirements for RevBayes, but certainly MrBayes only requires binary trees:

http://mrbayes.sourceforge.net/Help/usertree.html

@wrightaprilm So, while I realize MrBayes is on its way out, its still fairly widely used... thus, a comment in the documentation for treetstartr to that effect, that MrBayes will only accept binary trees would probably reduce some confused emails from lost users.

wrightaprilm commented 5 years ago

Hey @dwbapst @rachelwarnock -

RB does read in polytomy trees, and it's a bit of analytical priggishness that treestartr returns trees with polytomies. I'll make note of it in the docs, to resolve with multi2di or similar if you're planning to use MB. And actually, this line of thinking is giving me an idea for something to add to the package - exportation of RB-formatted clade constraints that include your fossils (see here).

Again, thanks so much for all your help kicking the tires on this thing.

wrightaprilm commented 5 years ago

Just making a TODO-list. Ignore, unless I've misrepesented your points ;)

[x] Making a manual page named treeStartR or with that alias so that users who cannot remember a function name but can remember the package name [x] Report the number of OTUs added and the number of OTUs not added, perhaps also giving user a taxon list of the still-absent OTUs, to use with the other taxon-placing methods. [x] Defend choice: In the above code example, apparently you cannot make two species sister to each other, You can only place them in a polytomy together with a third taxon. This seems like it might be an unintentional design oversight. [x] Spellcheck [x ] Bug: text_placr: And then the taxon gets added to the root node, bizarrely. [x] Env: I don't think you need to use data(bears) since this dataset is already loaded into the environment when you load the package. data(ants) returns "data set 'ants' not found", although this dataset has a help page. [x] rand_absent_tippr: clearer example [x] Env: With the bears dataset, the absent_list, mrca_df and tree components of the data are available, but not tax_list. [x] get_found, get_lost, make_absentdf: Make private [x] Re: Ages. the immediate above: Print FBD-formatted clade constraints for RevBayes [x] absent_tippr - the documentation say tips are added "via user input" but its not clear that this is function is interactive. [x] dataf_parsr - The function description mentions morphological and molecular data but I think this function is designed to read the file containing the stratigraphic info only. [x] genera_strippr: This function expects a data.frame. you should be explicit about this in the help file. [x] genera_strippr: possibly even allow users to vary it by adding an argument like taxonNamesVariable = "taxon"' so the default behavior stays the same. _AMW_: Edited this behavior in thedataf_parsr` function. [x] genera_strippr: The manual page for this function refers to 'morphological & stratigraphic genera'. Its not clear to what the distinction is between these. [x] genera_strippr: why does this function print and return absent? [x] present_tippr: very noisy. [x] present_tippr: clarify in docs [x] present_tippr, absent_tippr and text_placr.: Suppress warning. [x] text_placr: clarify inputs [x] Bug: if only a single congener is on the tree, the the species gets added, but to the root node: [x] absent_tippr: You may want to clarify the prompt to users so they know they need to enter the number. [x] genera_strippr: although I note that it prints the entire absentList to the console, which is overly noisy. [x] Man: Where it says TSV file, do you mean CSV? or CSV/TSV. AMW: Can be either - have noted this in documents [x] ReadMe: The main functionality of the package could be stated earlier on in the README. [x] README: The package depends on the package phytools, but since you use the phytools:: and ape:: function calling format (which is great programming practice) you don't have to load the phytools library on start up for the treestartr functions to work (I tested to make sure). [x] README: ts useful to have the function names within text using the markdown coding format, i.e. dataf_parser instead of dataf_parser. That way, the functions are highlighted if you want to quickly scan the README to locate info. [x] Link to docs for other package functions (i.e. phytools, phangorn)

wrightaprilm commented 5 years ago

I've made the vast majority of the straight-forward changes discussed above in as asked. There are several that bear more discussion, though.

Taxa being added to the root node: This was a bug resulting from getting the parent node of a vector, as opposed to a single taxon. It is now fixed. Necessity of age information in the taxa file: Rachel noted correctly that the ages are never used. This is somewhat of a bias on my part - I wanted it to be possible to just take your RevBayes taxon file, and automatically generate a starting tree from it. I've now amended the behavior of the parser to allow you to do so, but so that it only requires a column of taxa (i.e., ages are optional). RevBayes (and BEAST I believe is the same) doesn't actually parse the dates off a starting tree; it gets them from the ages file and the fossil interval file, so decorating the tree with them wouldn't be overly helpful. Polytomies: I added a section to the README to make it obvious that you'll need to resolve polytomies. Since we're not estimating placements from data, I pretty strongly feel that polytomies are the way to export the tree, and I think users should be forced to confront the decision to resolve them if that's what they'd like to do. Formatted clade constraints: I added two functions echo_subtree and echo_rb. Echo subtree simply calls ape's print subtree function to print the subtree to which the tip was added, including the tip. echo_rb created RevBayes formatted clade constraints. I added two tests for these functions. Ants data: I just took this out. It's for the eventual MEE paper, and I think I would like it to be in a separate repo.

I have a couple CRAN check warnings that are both stubborn and cryptic. Can I ask about them?

Thanks so much to all three of you for your help with this; I can already see the package is much improved.

rachelwarnock commented 5 years ago

This all sounds great to me!

On 8 Oct 2018, at 22:24, April Wright notifications@github.com wrote:

I've made the vast majority of the straight-forward changes discussed above in as asked. There are several that bear more discussion, though.

Taxa being added to the root node: This was a bug resulting from getting the parent node of a vector, as opposed to a single taxon. It is now fixed. Necessity of age information in the taxa file: Rachel noted correctly that the ages are never used. This is somewhat of a bias on my part - I wanted it to be possible to just take your RevBayes taxon file, and automatically generate a starting tree from it. I've now amended the behavior of the parser to allow you to do so, but so that it only requires a column of taxa (i.e., ages are optional). RevBayes (and BEAST I believe is the same) doesn't actually parse the dates off a starting tree; it gets them from the ages file and the fossil interval file, so decorating the tree with them wouldn't be overly helpful. Polytomies: I added a section to the README to make it obvious that you'll need to resolve polytomies. Since we're not estimating placements from data, I pretty strongly feel that polytomies are the way to export the tree, and I think users should be forced to confront the decision to resolve them if that's what they'd like to do. Formatted clade constraints: I added two functions echo_subtree and echo_rb. Echo subtree simply calls ape's print subtree function to print the subtree to which the tip was added, including the tip. echo_rb created RevBayes formatted clade constraints. I added two tests for these functions. Ants data: I just took this out. It's for the eventual MEE paper, and I think I would like it to be in a separate repo.

I have a couple CRAN check warnings that are both stubborn and cryptic. Can I ask about them?

Thanks so much to all three of you for your help with this; I can already see the package is much improved.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

dwbapst commented 5 years ago

April, one of my secret joys in life is figuring out cryptic R CHECK --as-cran warnings. I'm happy to answer questions about them. Being able to quietly pass --as-cran is vital for most R packages.

I'll try to find tomorrow to sit down and plod through my examples again.

On Mon, Oct 8, 2018 at 4:24 PM April Wright notifications@github.com wrote:

I've made the vast majority of the straight-forward changes discussed above in as asked. There are several that bear more discussion, though.

Taxa being added to the root node: This was a bug resulting from getting the parent node of a vector, as opposed to a single taxon. It is now fixed. Necessity of age information in the taxa file: Rachel noted correctly that the ages are never used. This is somewhat of a bias on my part - I wanted it to be possible to just take your RevBayes taxon file, and automatically generate a starting tree from it. I've now amended the behavior of the parser to allow you to do so, but so that it only requires a column of taxa (i.e., ages are optional). RevBayes (and BEAST I believe is the same) doesn't actually parse the dates off a starting tree; it gets them from the ages file and the fossil interval file, so decorating the tree with them wouldn't be overly helpful. Polytomies: I added a section to the README to make it obvious that you'll need to resolve polytomies. Since we're not estimating placements from data, I pretty strongly feel that polytomies are the way to export the tree, and I think users should be forced to confront the decision to resolve them if that's what they'd like to do. Formatted clade constraints: I added two functions echo_subtree and echo_rb. Echo subtree simply calls ape's print subtree function to print the subtree to which the tip was added, including the tip. echo_rb created RevBayes formatted clade constraints. I added two tests for these functions. Ants data: I just took this out. It's for the eventual MEE paper, and I think I would like it to be in a separate repo.

I have a couple CRAN check warnings that are both stubborn and cryptic. Can I ask about them?

Thanks so much to all three of you for your help with this; I can already see the package is much improved.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ropensci/onboarding/issues/239#issuecomment-427983973, or mute the thread https://github.com/notifications/unsubscribe-auth/ABfVL3wNCEHjU6WjLUXbv8pGnNUFoD8vks5ui8KPgaJpZM4Vbw08 .

-- David W. Bapst, PhD Asst Research Professor, Geology & Geophysics, Texas A & M University Postdoc, Ecology & Evolutionary Biology, Univ of Tenn Knoxville https://github.com/dwbapst/paleotree

dwbapst commented 5 years ago

Hi April, I'm afraid I encountered a number of issue, particularly some functions which broke (and one function that broke if package ape was loaded).

Here's my R script I was working from with commented out output for you to look at:

devtools::install_github("wrightaprilm/treeStartR")

library(treestartr)
packageVersion("treeStartR")

# > packageVersion("treeStartR")
# [1] ‘0.1.0’

# to build manual:
# devtools:::build_manual("d://dave//workspace//treeStartR")

data(bears)

#########################################################

absent_list <- genera_strippr(tree, bears)
absent_list
# works

# if  tax_list isn't a data.frame, it silently fails (returns no error)
    # but returns NA as the output
# Here's two examples:
bears2 <- as.vector(bears[,1]) # class character
absent_list2 <- genera_strippr(tree, bears2)
absent_list2
#
tax_list<-bears[,1] # factor
absent_list4 <- genera_strippr(tree, tax_list)
absent_list4

# example output
# > absent_list4 <- genera_strippr(tree, tax_list)
# > absent_list4
# [1] NA

# if the variable isn't named 'taxon' it can't find the taxon names, returns error
bears3 <- bears
names(bears3)[1] <- "taxa" 
absent_list3 <- genera_strippr(tree, bears3)
absent_list3

# > absent_list3 <- genera_strippr(tree, bears3)
# Error in `[.data.frame`(tax_list, "taxon") : undefined columns selected
# > absent_list3
# Error: object 'absent_list3' not found

# this requirement needs to be specified in the documentation file

#####################################################
new_tree <- present_tippr(tree, absent_list)
layout(1:2)
plot(tree);plot(new_tree)
# works

# weird error if only a single congener is on the tree:
tree2<-ape::drop.tip(tree,"Indarctos_vireti")
new_tree <- present_tippr(tree2, absent_list)

# output:
# >  new_tree <- present_tippr(tree2, absent_list)
# Tree tip names formatted correctly
# [1] "Indarctos_arctoides"
# Error in getParent(tree, loc) : object 'loc' not found

# COULD NOT TRY FOLLOWING AGAIN:
# cannot make two species sister to each other,
    # can only place them in a polytomy together

#####################################

new_tree <- absent_tippr(tree, absent_list)

# this function no longer works:
# > new_tree <- absent_tippr(tree, absent_list)
# Error: 'plot' is not an exported object from 'namespace:ape'

# it looks like its looking for ape::plot.phylo but there's
    # possibly a reference to ape::plot (which doesn't exist
# plot.phylo is what we call an S3 method - we can call a
    # generic function, but R checks if there
# is a more specific function for that class first

# COULD NOT TRY FOLLOWING AGAIN:
# not immediately clear that its asking for me to type in the
    # number of the node, rather than interactively click on the plot
    # (as in some ape and phytools functions)
# may want to clarify the prompt to users

# COULD NOT TRY FOLLOWING AGAIN:
# Where would you like to place the tip?Zaragocyon_daamsi
#Error in if (where <= length(tree$tip.label) && position == 0) { : 
#  missing value where TRUE/FALSE needed
#In addition: Warning messages:
#1: In bind.tree(tree, tip, where = where, position = pp) :
#  one tree has no branch lengths, they have been ignored
#2: In absent_tippr(tree, absent_list) : NAs introduced by coercion

###############################

new_tree <- rand_absent_tippr(tree, absent_list)
plot(new_tree)
# works, but...

# fatal error if run after loading package ape
#
# > library(ape)
# > new_tree <- rand_absent_tippr(tree, absent_list)
# Adding tips at random node: Ailurarctos_lufengensis
# Adding tips at random node: Kretzoiarctos_beatrix
# Error in if (tree$edge.length[which(tree$edge[, 2] == where)] <= 1e-12) { : 
#  missing value where TRUE/FALSE needed

################################################
new_tree <- text_placr(tree, mrca_df)

# get this sort of output:
# > new_tree <- text_placr(tree, mrca_df)
# Placing tip Kretzoiarctos_beatrix
# via relatives Indarctos_arctoidesIndarctos_vireti
#  at node 25
# Placing tip Ursus_abstrusus
# via relatives Ursus_arctosUrsus_spelaeusUrsus_americanus
#  at node 30

# printing the node numbers for usr possibly isn't important (or maybe it is)
# may want to paste in spaces or commas between taxon names for readability

# test if only a single relative is listed:
mrca_df2 <- mrca_df[-1,]
new_tree <- text_placr(tree, mrca_df)
# works fine!

I hope this helps.

wrightaprilm commented 5 years ago

if tax_list isn't a data.frame, it silently fails: Fixed - I hadn't considered the situation where someone might not use the dataframe loader function, but might be working separately with the taxa and then call treestartr. So now the genera_strippr function checks for dataframe, tries to coerce one it it isn't, and if that isn't possible, returns an error. weird error if only a single congener is on the tree:: This is a bug - I changed a variable name in one spot, but not another. Error: 'plot' is not an exported object from 'namespace:ape: I tried to solve a CRAN check error and created a namespace collision. It's fixed now. fatal error if run after loading package ape: I was hilariously mystified by this, but adding in phytools:: before bind.tip solves it (and I should have done anyway).

I forgot to note this in my last entry: I didn't add having the tips that weren't placed echo to the screen after placing tips. That can be recalculated by calling the genera_strippr, and I felt like some of the functions were already getting a bit noisy, especially if you do something like print RevBayes clade constraints.

So what I'm really trying to say is I'll probably take you up on CRAN check assistance once the toothpick comes out clean on the rest of it.

Thanks for having a second look. It's so easy to get caught up in how I would use this package and not consider how another person might integrate it into a workflow, particularly in conjunction with more complex data processing or other packages.

dwbapst commented 5 years ago

Yeah, need to be explicit that you're importing ape::plot.phylo and phytools::bind.tip. Sounds like you've fixed the issue. I think maybe ape has its own bind.tip with slightly different behavior.

Its absolutely not a requirement from your perspective, but I am curious if there is a specific reasoning why ape isn't a depends - although there is general advice to import packages rather than loading the whole package, most phylogenetics packages in R depend on ape as it is usual for users to be doing phylogenetics-things that require the S3 methods for class phylo in ape. Are you envisioning a use case where users are unlikely to have ape loaded?

I'll try to rerun the code later, after teaching.

dwbapst commented 5 years ago

Hi April, A lot more works now, but I think there is some oversight about how species are added whether to tips or lower nodes, and that sort of behavior needs to be specified in the documentation, cause I can't figure out how intentional or not this behavior is. Particularly see the worked example at the end of the following, and I'll post the plotted result as well.

devtools::install_github("wrightaprilm/treeStartR")

library(treestartr)
packageVersion("treeStartR")

# > packageVersion("treeStartR")
# [1] ‘0.1.0’

# to build manual:
# devtools:::build_manual("d://dave//workspace//treeStartR")

data(bears)

#########################################################

absent_list <- genera_strippr(tree, bears)
absent_list
# works

# if  tax_list isn't a data.frame, it works
# Here's two examples:
bears2 <- as.vector(bears[,1]) # class character
absent_list2 <- genera_strippr(tree, bears2)
absent_list2
# another example
tax_list<-bears[,1] # factor
absent_list4 <- genera_strippr(tree, tax_list)
absent_list4

# if the variable isn't named 'taxon' it can't find
    # the taxon names, returns error
bears3 <- bears
names(bears3)[1] <- "taxa" 
absent_list3 <- genera_strippr(tree, bears3)
absent_list3

# > absent_list3 <- genera_strippr(tree, bears3)
# Error in `[.data.frame`(tax_list, "taxon") : undefined columns selected

# Some options (choose one, or combine several):
# Use the one column provided if it a one-column data table
    # or use the first column as a default?
    # but issue an informative warning() saying you are doing so
# Issue an informative error message with stop()
    # that it can't find the taxon names as expect
    # saying the function expects a certain variable name
# Specify this requirement in the documentation file

#####################################################
new_tree <- present_tippr(tree, absent_list)
layout(1:2)
plot(tree);plot(new_tree)
# works

# drop Indarctos, see if it can place two species as sister
    # to each other, or if it makes a polytomy:
tree2<-ape::drop.tip(tree,"Indarctos_vireti")
new_tree <- present_tippr(tree2, absent_list)
layout(1:2)
plot(tree);plot(new_tree)

# it places them in a polytomy instead, making the genus paraphyletic
# is this intentional behavior?

#####################################

new_tree <- absent_tippr(tree, absent_list)
# nice clear interaction :D

# Although not labeled, we can also give tip ID numbers
    # and thus add taxa as sister taxa on the tree
# Is this intentional behavior? It isn't described in documentation.
# If this is intentional behavior it should be described in documentation.

###############################

new_tree <- rand_absent_tippr(tree, absent_list)
plot(new_tree)
# works even with ape loaded

################################################
new_tree <- text_placr(tree, mrca_df)

# test if only a single relative is listed:
mrca_df2 <- mrca_df[-1,]
new_tree <- text_placr(tree, mrca_df2)
plot(new_tree)

# Hmm, adds it to a polytomy with both Indarctos species
    # Is this intentional behavior?

####################

# constructed example

set.seed(444)
# generate Genus Names
genNames<-paste0("R",
    apply(combn(c("a","b","c","d","e"),2),
        2,paste0,collapse="")
    )
# generate species names
specNames<-apply(combn(c("a","b","c","d"),3),
        2,paste0,collapse="")
#
#combine
extraSpec<-as.vector(sapply(genNames,function(x) 
    paste0(x,"_",specNames)))
tipSpec<-paste0(genNames,"_regularis")
#
#lets make our hypothetical simulated tree of higher taxa
singleSpeciesTree <- rtree(10,tip.label=tipSpec)
#
# format extraSpec correctly
taxonDF<-data.frame(taxon=extraSpec)
#
library(treestartr)
absentList <- genera_strippr(singleSpeciesTree , taxonDF)
absentList 

# just first three
newTree <- present_tippr(singleSpeciesTree , absentList[1:3])
# works

# all
newTree <- present_tippr(singleSpeciesTree , absentList)
# works

plot.phylo(newTree, no.margin=TRUE)

# its collapsed all sister genera into each other
    # this really doesn't look intentional!

wrightaprilm commented 5 years ago

Thanks, again, for all these thorough comments. I really do appreciate it.

# Error in [.data.frame(tax_list, "taxon") : undefined columns selected: Bug caused by inappropriate curly brace closure.

# Although not labeled, we can also give tip ID numbers and thus add taxa as sister taxa on the tree: Added this to the visual, and suppressed warnings for once sister taxa are added (which gives a polytomy warning).

it places them in a polytomy instead, making the genus paraphyletic # is this intentional behavior? It was not ... and I see how it doesn't make sense. I've changed this so that if there is one congener, the tip is added as sister.

text_placr polytomy: I genuinely would like to hear opinions on this - I like this behavior, but you seem confused by it. Is there a different behavior that would be preferable?

Placements breaking up monophyletic groups: I can see why this one is undesireable. This is fixed by allowing tips to be added as sister, an regenerating the mrca_list afterwards, such that any subsequent tips will be placed into the now-monophyletic grouping. The above now returns the below, which I think is the behavior we want:

rplot

dwbapst commented 5 years ago

Hi April,

I think labeling the tips makes it clear that it is acceptable for users to add taxa as sisters to the tips. I just was pushing on the documentation issue because there weren't tip labels, so it wasn't clear if the behavior was intended or not. (Unintentional behaviors may be relied upon by others, but be unstable because you might not know someone else depends on that functionality, so its best to be clear). I think its clear now, though.

I still have a tiny issue with how genera_strippr picks what column to use from a data.frame, which I think you could solve just by being more explicit/detailed in the documentation for that function.

Overall, I'd say I don't have really any issue with the package now.

devtools::install_github("wrightaprilm/treeStartR")

library(treestartr)
packageVersion("treeStartR")

# > packageVersion("treeStartR")
# [1] ‘0.1.0’

# to build manual:
# devtools:::build_manual("d://dave//workspace//treeStartR")

data(bears)

#########################################################

absent_list <- genera_strippr(tree, bears)
absent_list
# works

# if  tax_list isn't a data.frame, it works
# Here's two examples:
bears2 <- as.vector(bears[,1]) # class character
absent_list2 <- genera_strippr(tree, bears2)
absent_list2
# another example
tax_list<-bears[,1] # factor
absent_list4 <- genera_strippr(tree, tax_list)
absent_list4

# the first row is always used if a data.frame is handed over
bears3 <- bears[,2:1]
#names(bears3)[1] <- "taxa"
absent_list3 <- genera_strippr(tree, bears3)
absent_list3

# > absent_list3 <- genera_strippr(tree, bears3)
#> absent_list3
# [1] "0"    "0"    "0"    "0"    "0"    "0"    "0"    "0"    "5"    "7.7" 
#[11] "0.5"  "14"   "9.6"  "5.1"  "8.7"  "11.7" "36.6" "16.2" "14.4" "4.3" 
#[21] "0.06" "21.9"

# I really do think you need to be explicit in the documentation that
# its the first column, or a column with a particular label, or whatever
# the rule is, or say so in a warning() or a message() when 

# From before:
#
# Some options (choose one, or combine several):
# Use the one column provided if it a one-column data table
    # or use the first column as a default?
    # but issue an informative warning() saying you are doing so
# Issue an informative error message with stop()
    # that it can't find the taxon names as expect
    # saying the function expects a certain variable name
# Specify this requirement in the documentation file

#####################################################
new_tree <- present_tippr(tree, absent_list)
layout(1:2)
plot(tree);plot(new_tree)
# works

# drop 1 Indarctos sp, places other sp  
    # as sister to the one remaining congeneric
# works!
tree2<-ape::drop.tip(tree,"Indarctos_vireti")
new_tree <- present_tippr(tree2, absent_list)
layout(1:2)
plot(tree, no.margin=TRUE);plot(new_tree,no.margin=TRUE)

#####################################

new_tree <- absent_tippr(tree, absent_list)
# nice clear interaction :D

# we can also give tip ID numbers
    # and thus add taxa as sister taxa on the tree
# Documentation doesn't rule out this behavior,
    # labeling makes it clear this is probably okay
# Cool

###############################

new_tree <- rand_absent_tippr(tree, absent_list)
plot(new_tree)
# works even with ape loaded

################################################
new_tree <- text_placr(tree, mrca_df)

# test if only a single relative is listed:
mrca_df2 <- mrca_df[-1,]
mrca_df2
new_tree <- text_placr(tree, mrca_df2)
plot(new_tree)

# K_beastrix added as a sister properly

####################

# constructed example

set.seed(444)
# generate Genus Names
genNames<-paste0("R",
    apply(combn(c("a","b","c","d","e"),2),
        2,paste0,collapse="")
    )
# generate species names
specNames<-apply(combn(c("a","b","c","d"),3),
        2,paste0,collapse="")
#
#combine
extraSpec<-as.vector(sapply(genNames,function(x) 
    paste0(x,"_",specNames)))
tipSpec<-paste0(genNames,"_regularis")
#
#lets make our hypothetical simulated tree of higher taxa
singleSpeciesTree <- rtree(10,tip.label=tipSpec)
#
# format extraSpec correctly
taxonDF<-data.frame(taxon=extraSpec)
#
library(treestartr)
absentList <- genera_strippr(singleSpeciesTree , taxonDF)
absentList 

# just first three
newTree <- present_tippr(singleSpeciesTree , absentList[1:3])
# works

# all
newTree <- present_tippr(singleSpeciesTree , absentList)
# works

plot.phylo(newTree, no.margin=TRUE)

# its collapsed all sister genera into each other
    # this really doesn't look intentional!

wrightaprilm commented 5 years ago

OK, added some more information to this point to the function documentation, and a more informative stop() message.

wrightaprilm commented 5 years ago

If we're all happy with the major package functions, is it OK if I ask some CRAN check questions?

I'm not sure what this means: absent_tippr: no visible global function definition for ‘plot’
Or this: Undocumented code objects: ‘absent_list’ ‘mrca_df’
Also this: Files in the 'vignettes' directory but no files in 'inst/doc': ‘vignette.Rmd’, ‘vignette.pdf’

I feel very silly about this, because this part seems like it should be easy.

dwbapst commented 5 years ago

No, don't feel silly - R CHECK --as-cran is often one of the hardest parts because of the opaqueness of the messages.

You need to specify what package you are importing plot from. Is that base plot in graphics or ape? You likely need to import plot from graphics - its a relatively new requirement.
You have exported objects in the code, or arguments for exported functions that have no documentation assigned to them. You need to search for absent_list and mrca_df.
Hmm. I would try deleting your inst directory. To be honest, I'm not sure why you have one to begin with, and the vignette in it is out of date, I think.

sckott commented 5 years ago

agree, don't feel silly @wrightaprilm ! The cran check stuff is tough and no fun for anyone.

I think you should add inst/docs to .Rbuildignore. inst/doc is a formal directory that is used when building the package, so we don't want to ignore that of course.

I think a problem might be in the line in your DESCRIPTION file where you have VignetteBuilder: utils - I've not seen utils in there before. Maybe it is valid, I don't know, but usually the entry there is knitr

In the vignette, you have the line

tax_list <- dataf_parsr("inst/extdata/bears_taxa.tsv")

Instead of the path for that directly, i'd use

tax_list <- dataf_parsr(system.file("extdata/bears_taxa.tsv", package = "treestartr"))

Also, that chunk has eval=FALSE which means tax_list is never created, which means the next chunk then can't find tax_list and the vignette build fails

wrightaprilm commented 5 years ago

I'm still a little fuzzy on what it means for an exported object to have attached documentation. For example, mrca_df is defined as a parameter to text_placr, and I see it in the manual.

dwbapst commented 5 years ago

@wrightaprilm as I understand it, every object in your example data file needs to have a listed man page with the same name, or a page where it is listed as an alias. E.g. it is an object in your bears example data, apparently:

https://github.com/wrightaprilm/treeStartR/blob/894ee746f58d128b17f5874eb5186748504320c6/R/data.R#L13

You need to list all the components, if those are separate object in the save .Rdata file, as alias via #' @aliases. For example, see:

https://github.com/bomeara/treevo/blob/master/R/simRunExample.R#L11

dwbapst commented 5 years ago

As an addendum, it’s usually good practice to name objects in an example data file something different than the exact name of function arguments so we know which one R is complaining about.

Cheers, -Dave

On Fri, Oct 19, 2018 at 2:38 PM April Wright notifications@github.com wrote:

I'm still a little fuzzy on what it means for an exported object to have attached documentation. For example, mrca_df is defined as a parameter to text_placr, and I see it in the manual.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ropensci/onboarding/issues/239#issuecomment-431475882, or mute the thread https://github.com/notifications/unsubscribe-auth/ABfVLyVOVoTH2sICPLkR0qT0EQG2aJhvks5umiotgaJpZM4Vbw08 .

-- David W. Bapst, PhD Asst Research Professor, Geology & Geophysics, Texas A & M University Postdoc, Ecology & Evolutionary Biology, Univ of Tenn Knoxville https://github.com/dwbapst/paleotree

wrightaprilm commented 5 years ago

I think I may have done it - all NOTEs, ERRORs and WARNINGs silenced, and build is passing.

Now what?

sckott commented 5 years ago

Nice work @wrightaprilm I see a few more things:

add these to your .Rbuildignore file: README.Rmd, README.html, bibliography.bib, deprecated, docs, images, paper, refs.bib
There are some undocumented objects still:

checking for missing documentation entries (1s)
   Undocumented code objects:
     ‘absent_tippr’ ‘genera_strippr’ ‘present_tippr

There's some remaining issues from my comment above https://github.com/ropensci/onboarding/issues/239#issuecomment-430371029 Did you see it yet?