m-orton / Evolutionary-Rates-Analysis-Pipeline

The purpose of this repository is to develop software pipelines in R that can perform large scale phylogenetics comparisons of various taxa found on the Barcode of Life Database (BOLD) API.
GNU General Public License v3.0
7 stars 1 forks source link

Matt - task list #45

Closed sadamowi closed 7 years ago

sadamowi commented 7 years ago

Hi Matt,

To help us stay organized, I've also prepared a task list for you. Please feel free to edit!

  1. Work on methods section: -consider my comments from the last round -add the divergent sequence ejector step -briefly explain the "rules" for dropping sequences on the basis of generating indels in the alignment -explain the subsetting of large insect orders by geographic region -return the draft to me for the round round of revisions

  2. Cross check the separate geographic output files for the large insect orders. Are there ingroup BINs that appear in multiple regions? If this occurs frequently, then I think we should discuss potential solutions. We could, for example, consider retaining just the pair with the smallest ingroup distance. I hope that this does not occur frequently. We would mention in the methods section that we checked for this.

  3. Have a look at the initial workspaces and consider if we can run the pipeline on any additional markers (for select taxa).

  4. Can legends be made larger in the map figures?

  5. Prepare the final map figure (after discussion).

  6. Collaborate in writing up the discussion.

m-orton commented 7 years ago

Looks good Sally, I'm working on 1 and should be able to get it done very soon to send back to you.

m-orton commented 7 years ago

Hi Sally, I just added a revised methods section and tried to incorporate all of the points you mentioned. I added comments explaining all of the revisions I made. I also decided to break things up a little by adding a few more section titles.

Best Regards, Matt

sadamowi commented 7 years ago

Hi Matt,

I understand that you are working on coding the signed branch lengths calculator. Thank you very much.

If that goes well, I am wondering if we can run the Actinopterygii as well? There are a lot of pseudoreplicates in that file, and I am concerned about making a manual error. As each class is below 20K BINS, I am also wondering it if may make sense to run Chordata again, as a whole (with one REF seq for each class), with the revised pipeline code?

I have completed updating the results for the Cnidaria, Annelida, Echinodermata, and Mollusca. I also processed the Perciformes and Cypriniformes files. I will pause working on the further files for now, until we discuss how the informatic solution is going. I think it was good that I worked more closely with some of these files as I looked at the results more closely. This helped to uncover a few things (such as some extreme branch length ratios at small values, as we have discussed already).

Thanks for letting me know your thoughts. I will turn my focus to the manuscript file for now.

Best wishes, Sally

m-orton commented 7 years ago

Hi Sally, no problem, I can rerun Actinopterygii with the new changes (may not get to this until tomorrow depending on how things go)

Redoing my task list to help me stay organized:

  1. Update script with relative branch length code + 2% divergence code and run through Arthropoda (Section 14 onwards) and Actinopterygii (entire class).

  2. Work on methods section: -consider most recent comments and make edits -return the draft to Sally for the round of revisions

  3. Cross check the separate geographic output files for the large insect orders. Are there ingroup BINs that appear in multiple regions? If this occurs frequently, then I think we should discuss potential solutions. We could, for example, consider retaining just the pair with the smallest ingroup distance. I hope that this does not occur frequently. We would mention in the methods section that we checked for this.

  4. Have a look at the initial workspaces and consider if we can run the pipeline on any additional markers (for select taxa).

  5. Legend edits to map - larger, show lines and symbols more clearly

  6. Prepare the final map figure (after discussion).

  7. Collaborate in writing up the discussion.

sadamowi commented 7 years ago

OK great - thanks Matt.

sadamowi commented 7 years ago

Hi Matt,

With regards to task #2 above, I wanted to let you know that I have now closed the manuscript file. It's now to you!

I have made comments/edits to the Methods sections. I've also prepared a preliminary draft of the Results section and Discussion section (with the latter mostly in point form with with a fairly complete flow of arguments).

Please do look at all three of those sections and add any comments that you have. You can turn "track changes" off and just make a note in the margins for me if you want to draw my attention to any major points or changes - thank you. I will read the whole thing with every iteration.

Also, if you are able to work towards completing Table 1, that would be great, but please do let me know if you'd like my help to finish filling in that table (e.g. Aves?). Thank you for running the Arthropoda and Actinopterygii. I have set up the table with the taxa that I think we should show full results for, but of course go ahead and add lines for any additional taxa that you think should be shown. I omitted several groups that we previously had listed in the table due to small sample size.

We could add an extra column for the median branch length ratio according to the other method. What do you think? Or, we could report this for select taxa in the results prose.

In Table 1, I marked in green the taxa that I have updated with our revised methods. Note that, according to the current table structure, the lower-level taxa are subsets of the higher-level taxa. So, results from some datasets would need to be combined to be able to report the full results for the higher-level taxa.

Over the next 1-2 days, I'd like to complete my reading and a full draft of the introduction. Let's stay in touch about our progress. After I complete the intro in full draft, I could turn my attention to writing out the full discussion, but let me know if there are any more methods/results issues that crop up.

I think we are getting there!

Best wishes, Sally

m-orton commented 7 years ago

Hi Sally, sounds good, I'll get started with running through Arthropods and fishes with the new code sections and then i'll update the manuscript with the results I generate. I'll also update the results table with the alternative ratio method.

Best Regards, Matt

sadamowi commented 7 years ago

That's great. Thanks Matt. The new method may result in the same median, but we could consider the merits of reporting the mean and/or the Wilcoxon test results in the table or prose.

Best wishes, Sally

m-orton commented 7 years ago

This issue was moved to jmay29/lat-project#1