m-orton / Evolutionary-Rates-Analysis-Pipeline

The purpose of this repository is to develop software pipelines in R that can perform large scale phylogenetics comparisons of various taxa found on the Barcode of Life Database (BOLD) API.
GNU General Public License v3.0
7 stars 1 forks source link

Issue #1 - Notes on sequence length decision - 620 bp #2

Closed sadamowi closed 7 years ago

sadamowi commented 7 years ago

TEST - I suggest we type up notes about important analysis decisions and the reasons we made those decisions. I can type up our recent 620 bp decision. First, can you see this?

m-orton commented 7 years ago

Yes I can. Ok sounds good. In the future I will make more detailed notes with each commit I make. Also, I will make separate branches for the class level analysis and order level analysis. You can also make a separate branch for the pipeline to if there are any edits you want to make.

sadamowi commented 7 years ago

Notes about our decision to go with 620 bp as the main sequence length: -Most primers used in DNA barcoding studies amplify a region that is 658 bp long (between the primers). -At the ends, there is typically a short section that is only covered by one of the two chromatograms. Also, Taryn Athey's MSc thesis showed the low-frequency variants are more common near the ends, suggesting that sequencing and sequence editing errors are concentrated there. Therefore, we elected to go with a light trimming at each end. Symmetrical trimming (19 bp per end for full-length barcodes) was selected because most phyla are sequenced bidirectionally, even though many insects are now sequenced unidirectionally. -Also, going with a shorter sequence will permit us to include more BINs. -The length was standardized so that exactly the same genetic region will be compared across all tests.