openjournals / jose-reviews

Reviews for the Journal of Open Source Education (JOSE)
http://jose.theoj.org
Creative Commons Zero v1.0 Universal
34 stars 4 forks source link

[REVIEW]: treesiftr: An R package and server for viewing phylogenetic trees and data #35

Closed whedon closed 5 years ago

whedon commented 5 years ago

Submitting author: @wrightaprilm (April Wright) Repository: https://github.com/wrightaprilm/treesiftr Version: v1.0.0 Editor: @juanklopper Reviewer: @ethanwhite, @rachelss Archive: 10.5281/zenodo.2541824

Status

status

Status badge code:

HTML: <a href="http://jose.theoj.org/papers/82b5b8c6223ec0735602bbf927045913"><img src="http://jose.theoj.org/papers/82b5b8c6223ec0735602bbf927045913/status.svg"></a>
Markdown: [![status](http://jose.theoj.org/papers/82b5b8c6223ec0735602bbf927045913/status.svg)](http://jose.theoj.org/papers/82b5b8c6223ec0735602bbf927045913)

Reviewers and authors:

Please avoid lengthy details of difficulties in the review thread. Instead, please create a new issue in the target repository and link to those issues (especially acceptance-blockers) in the review thread below. (For completists: if the target issue tracker is also on GitHub, linking the review thread in the issue or vice versa will create corresponding breadcrumb trails in the link target.)

Reviewer instructions & questions

@ethanwhite & @rachelss, please carry out your review in this issue by updating the checklist below. If you cannot edit the checklist please:

  1. Make sure you're logged in to your GitHub account
  2. Be sure to accept the invite at this URL: https://github.com/openjournals/jose-reviews/invitations

The reviewer guidelines are available here: https://jose.theoj.org/about#reviewer_guidelines. Any questions/concerns please let @juanklopper know.

Review checklist for @ethanwhite

Conflict of interest

Code of Conduct

General checks

Documentation

Pedagogy / Instructional design (Work-in-progress: reviewers, please comment!)

JOSE paper

Review checklist for @rachelss

Conflict of interest

Code of Conduct

General checks

Documentation

Pedagogy / Instructional design (Work-in-progress: reviewers, please comment!)

JOSE paper

whedon commented 5 years ago

Hello human, I'm @whedon, a robot that can help you with some common editorial tasks. @ethanwhite, it looks like you're currently assigned as the reviewer for this paper :tada:.

:star: Important :star:

If you haven't already, you should seriously consider unsubscribing from GitHub notifications for this (https://github.com/openjournals/jose-reviews) repository. As a reviewer, you're probably currently watching this repository which means for GitHub's default behaviour you will receive notifications (emails) for all reviews 😿

To fix this do the following two things:

  1. Set yourself as 'Not watching' https://github.com/openjournals/jose-reviews:

watching

  1. You may also like to change your default settings for this watching repositories in your GitHub profile here: https://github.com/settings/notifications

notifications

For a list of things I can do to help you, just type:

@whedon commands
whedon commented 5 years ago
Attempting PDF compilation. Reticulating splines etc...
whedon commented 5 years ago

:point_right: Check article proof :page_facing_up: :point_left:

labarba commented 5 years ago

@ethanwhite, @rachelss β€” Thank you for agreeing to review this submission for JOSE! We are happy to welcome you to our adventure in scholarly publishing of open-source educational software and learning resources.

JOSE has published seven papers so far πŸ˜ƒ β€” have a look: http://jose.theoj.org Check our Reviewer Guidelines and let us know if you have any questions.

@juanklopper will be your handling editor: you're in good hands.

juanklopper commented 5 years ago

@ethanwhite, @rachelss a big thank you from me too. I look forward to your input with this submission. Let me know if there are any questions. If I can't help then @labarba will most definitely have the answers.

ethanwhite commented 5 years ago

@juanklopper - @wrightaprilm and I are co-authors on a Data Carpentry lesson (http://doi.org/10.5281/zenodo.570050). There are a large number of authors and I don't believe that this interferes with my objectivity, but it was in the last 4 years. I am reporting this to you as requested in the CoI link for your input.

labarba commented 5 years ago

Thanks for reporting this, @ethanwhite. I agree that this doesn't present conflict.

ethanwhite commented 5 years ago

OK, great. Thanks @labarba. I will proceed.

ethanwhite commented 5 years ago

A couple of questions/comments for @wrightaprilm from the "General checks" section:

License: Does the repository contain a plain-text LICENSE file with the contents of a standard license? (OSI-approved for code, Creative Commons for content)

Following the R packaging standard the LICENSE file does not include the actual license, only the year and copyright holder. The license is given in the DESCRIPTION file. One common approach to dealing with the conflicts between R's packaging standard and the more general open source standard of having the full license in the LICENSE file is to have the full license in LICENSE.md (which is included in .Rbuildignore) and keep LICENSE in the standard R format. This is rOpenSci's general strategy. I'm not sure what JOSS or JOSE's general approach is to this issue.

Version: Does the release version given match the repository release (v1.0.0)?

The repository is lacking a current release. If you're waiting to tag 1.0.0 until other comments have been satisfied that makes sense and I'll wait until it looks like things are wrapping up and then ping this issue again about tagging a release.

wrightaprilm commented 5 years ago

One common approach to dealing with the conflicts between R's packaging standard and the more general open source standard of having the full license in the LICENSE file is to have the full license in LICENSE.md (which is included in .Rbuildignore) and keep LICENSE in the standard R format. This is rOpenSci's general strategy. I'm not sure what JOSS or JOSE's general approach is to this issue.

This uncertainty is why the package is the way it is right now. I put the MIT license in the description, but wasn't sure where else that info needed to be, if anywhere.

The repository is lacking a current release. If you're waiting to tag 1.0.0 until other comments have been satisfied

Bingo!

rachelss commented 5 years ago

There are a couple of typos in paper.md - specifically [application]((https://wrightaprilm.shinyapps.io/treesiftr_app/) has an extra parenthesis in two places; the phrase "30-day paleobiological data " needs to be fixed, "non-paramtric" should be "non-parametric", "and use in " should be "which I use in". You might want to render with pandoc to check everything looks write prior to resubmission.

@juanklopper - I'd suggest that the submission requirements suggest submitters render paper.md with pandoc prior to submission in general as it's easier to proofread that way.

To install treesiftR I need ggtree. To install ggtree I need BiocManager. To install BiocManager I need R >= v3.5 . Alternatively I was able to install ggtree on my older version of R with older version of bioconductor with BiocInstaller::biocLite("ggtree"). I should probably keep everything up to date, and I realize listing dependencies is sufficient, but I bet I'm not the only one who doesn't update so including these extra instructions would help out.

labarba commented 5 years ago

@rachelss β€” Our dear bot whedon generates the PDF for us right here in this issue. Scroll up and you will see:

:point_right: Check article proof :page_facing_up: :point_left:

rachelss commented 5 years ago

@wrightaprilm I could use a bit more explanation in the Instructor's Guide and the Shiny app. Or maybe I missed a piece of documentation? I went straight to Guide and app.

After navigating to the app I am instructed to subset a phylogenetic matrix. Where did the matrix come from? Maybe you could explain the data a bit first. That would help understand when I'm setting start = 1 what that means. You might put similar text on the app itself for users.

The character matrix on the righthand side of the app is not intuitive. Some words in the app to explain what these blocks are would help. Also, just having the blocks on the right doesn't make it obvious that the root / internal nodes are one color or the other - can you put colors of ancestral states on the tree?

Is the ability to view the likelihood score useful? I can see the tree is less likely with more characters but really you want to compare likelihoods for a single dataset not across datasets, and students can't do that here. But maybe parsimony isn't sufficient to really explain phylogeny estimation - it would depend on the class.

How much can Shinyapps.io handle? How many students could be using the app simultaneously? How many students across the country could be using it in a class (ie if a bunch of classes start using this will the app no longer be available)? Just wondering if there will end up being limitations that will make this difficult to use widely. You might need to provide guidance for users to prevent the app from crashing.

@juanklopper - is this how we're supposed to write these reviews?

wrightaprilm commented 5 years ago

Thanks, @rachelss, for these comments. I'll start working on them.

With re:

I'd suggest that the submission requirements suggest submitters render paper.md with pandoc prior to submission in general as it's easier to proofread that way.

I agree that this would be a useful addition to the author guidelines. I rendered in RStudio, which has its own idiosyncrasies. With a research paper submission, we normally think about grabbing the Latex template and rendering as you go, so that is a suggestion that would feel natural to people. Including a link to the template or preferred pandoc settings is something folks should have no real trouble with.

juanklopper commented 5 years ago

@wrightaprilm I'm back from all my duties as external examiner and wanted to know how you are getting along. Big thanks to @rachelss Excellent insight and suggestions.

wrightaprilm commented 5 years ago

I think we're good, thanks! Just waiting on @ethanwhite, in case there are conflicts between the two reviews.

Which is fine, I think we've all had some end-of-semester stuff going on that would make revising hard ;)

ethanwhite commented 5 years ago

Here's my review. Thanks for your patience and understanding @wrightaprilm. The end of the semester is indeed an... interesting time.

Overall, great work and really useful. I think this will be a great addition to a number of workshops and classrooms. For every checkbox that I haven't checked I've provided a description of what I think can be improved to make this ready to go.

General Checks

Documentation

Pedagogy / Instructional design

JOSE paper

wrightaprilm commented 5 years ago

Great, thanks both. Doesn't look like there's too much conflict between these two reviews. So revisions should go like dominoes, eh?

@juanklopper - what do I have for timing here? Am I OK to come back with these notes addressed in January?

labarba commented 5 years ago

πŸ‘‹ @wrightaprilm Happy New Year! You can proceed with the revision at your own pace, but it would be great to keep it going. Keep us posted!

wrightaprilm commented 5 years ago

Happy New Year, y'all.

I've responded to most of the points raised by reviewers. I'm still working on @rachelss's excellent suggestions about ancestral states + random trees, and should finish it tomorrow. I made the rest of the suggested edits, but made notes where I found suggestions confusing and/or wasn't sure how to resolve them within journal style.

Edit: Thanks much to all four of you; this has been really helpful!

Rachel's review:

I could use a bit more explanation in the Instructor's Guide and the Shiny app. Or maybe I missed a piece of documentation? I went straight to Guide and app.

After navigating to the app I am instructed to subset a phylogenetic matrix. Where did the matrix come from? Maybe you could explain the data a bit first. That would help understand when I'm setting start = 1 what that means. You might put similar text on the app itself for users.

I added some text to the instructor's guide, and added popify objects to the web interface to provide more information about the options, and a citation for the dataset.

The character matrix on the righthand side of the app is not intuitive. Some words in the app to explain what these blocks are would help. Also, just having the blocks on the right doesn't make it obvious that the root / internal nodes are one color or the other - can you put colors of ancestral states on the tree?

This is a fantastic idea. I'm still working on this, but the short answer should be yes.

Is the ability to view the likelihood score useful? I can see the tree is less likely with more characters but really you want to compare likelihoods for a single dataset not across datasets, and students can't do that here. But maybe parsimony isn't sufficient to really explain phylogeny estimation - it would depend on the class.

Ideally, you're correct. But I'm likely to add to this package later. We have a paper in the works on some alternative models of morphological evolution, and something that I'm likely to expand into in the future is adding more buttons to see how the assumptions made about the data change the likelihood score we compute for the data. The differences are large enough that they can be seen on a single character.

How much can Shinyapps.io handle? How many students could be using the app simultaneously? How many students across the country could be using it in a class (ie if a bunch of classes start using this will the app no longer be available)? Just wondering if there will end up being limitations that will make this difficult to use widely. You might need to provide guidance for users to prevent the app from crashing.

Thank you, added this info!

Ethan's:

License: The license file follows R's guidelines, but as a result the repository does not include an actual license.

Could I have some editor guidance on this one? I followed R conventions (LICENSE file + link in the DESCRIPTION to a license), but if there's something else required, I'm happy to provide it.

I'd also suggest adding a pdf of the Instructors Guide and linking to it directly from the README so that it is easy to find and read for someone first encountering the package online.

Good plan! Done.

Community guildelines: There is a code of conduct (which is awesome!) but I didn't find information on how to contribute or seek support.

Good catch - I put these in the .github folder, which I then forgot to commit and push. I moved the CONDUCT file in here, too, so it pops up when someone opens an issue or PR.

Question on this - rOpenSci allows you to put an email for the organization, if people need to report that a maintainer (i.e. me) is the problem on a code of conduct issue. Does JOSE have a similar email address? I didn't see one, but it's possible I did not look in the right place.

Learning Objectives: are a little difficult to determine from the current documentation. I understand that they are about helping students understand tree construction and phylogeny more broadly, but it would be helpful to lay this out a little more explicitly. I think a second short paragraph in the Introduction of the Instructors guide should be sufficient.

Good idea. I added a list of learning objectives to each subsection of the instructor guide, and which questions addressed each.

Usage: I think it would be useful in both the README and the Instructions guide to include some documentation designed to explain specifically how someone would adopt the module in a teaching setting. This is covered a bit in the JOSE paper and you get the idea that the learning material is lab exercise focused from the vignettes, but I'd suggest also putting some of those ideas front and center in the README and Instructors Guide so that folks more immediately understand the goals of the package and learing materials.

&

Use in classroom or other settings: I think just a couple of additional sentences describing how to adopt this material would make a big differences for helping make clear to readers exactly how they could use the material.

I'm not quite sure how to get at this differently than I have. I added one sentence on this at the end of the second paragraph of the statement of need, and one to the end of the introduction to the instructor's guide.

References: Lewis 2001 is missing the DOI: 10.1080/106351501753462876

Good catch.

wrightaprilm commented 5 years ago

All right, I implemented Rachel's idea to generate random trees to show how traits score differently when phylogenetic relationships are not informative with respect to trait evolution.

With respect to the question of ancestral states: ggtree is developed quite separately from the rest of the phylogenetics R space. I think the appropriate way to handle this feature is to make a pull request against Phangorn to add an additional output type for their ancestral state estimation machinery. Would it be all right if I put this suggestion on a longer-term feature request timetable, as opposed to immediately for review? I'm happy to do it, but with involving additional developers on other projects, there is a lot out of my control on this.

Again, thanks all four of you.

ethanwhite commented 5 years ago

LGTM! Pending editorial input on the license (my personal take on the best compromise solution is above in the discussion related to adding a LICENSE.md file) and the version being updated to 1.0.0 I think this is ready to go. Nice work @wrightaprilm!

labarba commented 5 years ago

πŸ‘‹ @rachelss β€” It sounds like the author has addressed a bunch of review comments. Can you go over things and let us know here if your comments have been addressed and you are ready to recommend publication? Thanks!

labarba commented 5 years ago

@whedon generate pdf

whedon commented 5 years ago
Attempting PDF compilation. Reticulating splines etc...
whedon commented 5 years ago

:point_right: Check article proof :page_facing_up: :point_left:

labarba commented 5 years ago

@wrightaprilm You have an extra paren that breaks the link in the beginning of the second paragraph.

rachelss commented 5 years ago

I looked. I'm sorry, but I do have more comments, just not right this second. I was trying to figure out exactly how to explain the challenges I would still have in teaching from the given instructions and material.

juanklopper commented 5 years ago

@wrightaprilm and everyone involved, please accept my best wishes for a fantastic 2019. @rachelss, we eagerly await your comments.

rachelss commented 5 years ago

"This course is a 30-day paleobiological data" - there's a typo here

shinyapps.io limits you to 25 hours / month - if you have more than 50 undergrads using this app for class I forsee a problem that instructors should be aware of before attempting to use the app to teach

Remind users how to install ggtree in the readme (it's a bioconductor package not cran)

need to fix link to instructor's guide in readme

typo in Parsimony Questions (01-treesiftr.pdf) q3

Reorder questions to group LOs

Need to tell students to uncheck "show parsimony score" after q1

q2 is difficult and confusing - I was looking for a monophyletic clade of a genus, which isn't the case and leaves me asking questions about what the data are telling me. I also have a hard time switching back and forth between the two trees and remember what was where.

Now you can add a question about parsimony scores on random trees and students can see reversals / repeat substitutions for a suboptimal tree

Are q1 and q3 essentially the same question? Does this serve a purpose? If the goal is to see a sub that happens in two different parts of the tree ask students about this.

q4 - prompt students to view the tree estimated from 8-10 first

q6 - I'm not sure what to do here. Do you need to prompt students to look at several different datasets that have the same parsimony score? Do you mean characters or alignments?

q7 - Since you don't have branch lengths on the tree I'm not sure how students are supposed to know that changes are occurring on longer or shorter branches.

q9 - Technically the answer to this q is 0 so you might want to clarify that you're only using PI sites.

q10 - point students to a specific set of characters

I expect most users will want to teach with the app so maybe point toward that first and tell people not to worry about the code.

Hide the Rmd output for the libraries in the readme.

In the readme you assign two different things to sample_df

Maybe just show generating one tree in the readme - you jump straight to instructions for generating a batch of trees from a batch of datasets.

To get at @ethanwhite request for more clarity on the use of the module I suggest looking at the literature on tree-thinking, particularly concept inventories. This work would provide motivation for teachers to use a tool to address specific thought processes that students find challenging. You need to point out to potential users that students really struggle to understand certain concepts and this app provides a hands-on way for them to work on these concepts.

Also, do you picture this as a lab / recitation? homework? small class? large class? one question at a time during a lecture? I'm picturing this as a lab/recitation with a TA circulating to provide help, but curious what you're thinking.

Sorry this is a lot for a second round - the first time I was struggling with the interface so didn't look at the questions. I'm ok with the Shiny app now there is more guidance on what I'm looking at.

wrightaprilm commented 5 years ago

No need to apologize - these are all great notes. I'll try to get them turned around tomorrow!

wrightaprilm commented 5 years ago

Sorry, I meant to get these in yesterday. This was really helpful feedback, @rachelss, and it's a cool feature of this review process to get notes from people who are both experts and potentially end users.

"This course is a 30-day paleobiological data" - there's a typo here

Fixed.

shinyapps.io limits you to 25 hours / month - if you have more than 50 undergrads using this app for class I forsee a problem that instructors should be aware of before attempting to use the app to teach

I'm actually on the "Basic" plan, which is 500 active hours a month. I'm not quite sure where to fit that information in.

Remind users how to install ggtree in the readme (it's a bioconductor package not cran)

Added, and noted which packages can be installed via CRAN and which cannot.

need to fix link to instructor's guide in readme

And the logo, apparently!

typo in Parsimony Questions (01-treesiftr.pdf) q3

I had fixed this, and forgotten to re-export. Fixed now.

Reorder questions to group LOs

Done.

Need to tell students to uncheck "show parsimony score" after q1

Done.

q2 is difficult and confusing - I was looking for a monophyletic clade of a genus, which isn't the case and leaves me asking questions about what the data are telling me. I also have a hard time switching back and forth between the two trees and remember what was where.

You are right - I switched this around to be more explicit about which characters to look at.

Now you can add a question about parsimony scores on random trees and students can see reversals / repeat substitutions for a suboptimal tree

Are q1 and q3 essentially the same question? Does this serve a purpose? If the goal is to see a sub that happens in two different parts of the tree ask students about this.

I Q3 over to be a random tree question, and also switched the position or 2 and 3 to match the LOs.

q4 - prompt students to view the tree estimated from 8-10 first

Done.

q6 - I'm not sure what to do here. Do you need to prompt students to look at several different datasets that have the same parsimony score? Do you mean characters or alignments?

Edited to make an explicit comparison between two character sets.

q7 - Since you don't have branch lengths on the tree I'm not sure how students are supposed to know that changes are occurring on longer or shorter branches.

Oh, this is a tricky one. I kind of relied on the visual cue of some branch lengths being longer. Perhaps I should add a scale bar?

q9 - Technically the answer to this q is 0 so you might want to clarify that you're only using PI sites.

Spot the paleontologist!

q10 - point students to a specific set of characters

Good idea, and I made some additional notes in the Instructor's Guide about why we get fully-resolved trees even when little/no data supports such.

I expect most users will want to teach with the app so maybe point toward that first and tell people not to worry about the code.

Great idea - I switched the order in both the README and the paper.

Hide the Rmd output for the libraries in the readme.

Done.

_In the readme you assign two different things to sampledf

Good catch.

Maybe just show generating one tree in the readme - you jump straight to instructions for generating a batch of trees from a batch of datasets.

Good idea!

To get at @ethanwhite request for more clarity on the use of the module I suggest looking at the literature on tree-thinking, particularly concept inventories. This work would provide motivation for teachers to use a tool to address specific thought processes that students find challenging. You need to point out to potential users that students really struggle to understand certain concepts and this app provides a hands-on way for them to work on these concepts.

Good point - I added a paragraph to the statement of need in which I review some of the literature on this point and highlight the misconceptions treesiftr is meant to address.

Also, do you picture this as a lab / recitation? homework? small class? large class? one question at a time during a lecture? I'm picturing this as a lab/recitation with a TA circulating to provide help, but curious what you're thinking.

I think it works in a variety of contexts. For the analytical paleobiology workshop, it was only 12 students, so I just circulated. I did this as a homework in my genetics class, after having done a lab on Shiny for population genetics. Therefore, the students weren't naive with respect to how to operate a Shiny app. So my feeling is that if this is their first introduction to R, or to operating a Shiny app, it should probably be in a lab or similar.

rachelss commented 5 years ago

Thank you for your patience with my extensive review. I think it's good to go. I hope to get a chance to use the Shiny app with undergrads - I think it's a really useful tool and having a set of questions ready to go is very helpful. Well done!

fyi - Please rebuild the Instructor's Guide pdf.

also fyi - I'm picturing this done with learnr to combine questions and app with instant feedback, but that's for the future

labarba commented 5 years ago

@rachelss β€” You have some unchecked items in the review checklist. Please check these off, when ready, and give me your recommendation to accept and publish, at that point.

@ethanwhite β€” You're missing the license checkmark. As for the version, after @wrightaprilm ups the version on the submission repository, for archival, we can use @whedon set vx.x.x as version here, to match.

ethanwhite commented 5 years ago

@labarba - My license checkmark is missing due to what I think is the still unresolved issue related to the fact that CRAN's approach to licensing does not include having the license text itself in the repository. I've suggested a solution to this and @wrightaprilm was requested editorial guidance. If JOSE is OK with not including the actual license text anywhere in the repository then I'm happy to check the box.

labarba commented 5 years ago

Thank you, @ethanwhite.

@wrightaprilm : Ethan said:

One common approach to dealing with the conflicts between R's packaging standard and the more general open source standard of having the full license in the LICENSE file is to have the full license in LICENSE.md (which is included in .Rbuildignore) and keep LICENSE in the standard R format.

Could you do that?

rachelss commented 5 years ago

I checked off everything but license and version. Once those are resolved I think this should be published.

wrightaprilm commented 5 years ago

Hadn't seen learnr - so cool!

Thanks to all four of you for all your help and comments. I really appreciate it, and I think the content delivery has improved a lot through this process.

labarba commented 5 years ago

@ethanwhite, @rachelss β€” The author has addressed remaining items. Could you check off the list and confirm your recommendation to publish?

@juanklopper β€” When both reviewers recommend publication, next step will be for @wrightaprilm to create an archive in Zenodo and report the DOI here. You can then use @whedon set <doi> as archive, and add the "accepted" label. I'll then publish!

ethanwhite commented 5 years ago

I confirm my recommendation to publish. Very nice work @wrightaprilm!

rachelss commented 5 years ago

I also confirm. Go ahead and publish.

labarba commented 5 years ago

@wrightaprilm β€” Please now make an archive on Zenodo, and report the DOI here. Make sure to edit the metadata as needed to match the paper (title and authors), as Zenodo just grabs that info from the repo.

juanklopper commented 5 years ago

Thanks @labarba. Will wait for the DOI.

wrightaprilm commented 5 years ago

Thanks so much for all your help, everyone! I've never made a Zenodo before - is this right? I had to hand-edit some things, as a user who made a PR showed up as an author.

juanklopper commented 5 years ago

Thanks @wrightaprilm. I see a DOI of 10.5281/zenodo.2541824 @labarba would you have a look if this is okay before I do the whedon command?

labarba commented 5 years ago

It looks good to me!

juanklopper commented 5 years ago

@whedon set 10.5281/zenodo.2541824 as archive

whedon commented 5 years ago

OK. 10.5281/zenodo.2541824 is the archive.

juanklopper commented 5 years ago

@whedon accept

whedon commented 5 years ago
Attempting dry run of processing paper acceptance...