m-orton / Evolutionary-Rates-Analysis-Pipeline

The purpose of this repository is to develop software pipelines in R that can perform large scale phylogenetics comparisons of various taxa found on the Barcode of Life Database (BOLD) API.
GNU General Public License v3.0
7 stars 1 forks source link

Very minor edits! #39

Closed jmay29 closed 7 years ago

jmay29 commented 7 years ago

Hello!! These are some minor spelling errors I came across and some thoughts I had about certain parts of the code:

Line 205: I believe "markercode" is missing from this line (after 'nucleotides')

Line 360: Should 'dfInitial' instead be 'dfCentroid' in this comment?

Line 456: I'm just wondering if "taxa" should instead be changed to "class" in dfRefSeq? Just so the user is clear what should be entered here? Just a thought!

Line 448: "approprate" spelling error

Line 455: Should there be a note around this part of the code that tells the user that all of the ref sequences in dfRefSeq should be the same length (i.e. 658 bp) for line 465 to work on each sequence equally? What if ref sequences of different lengths are used?

Line 479: Change 'allseq dataframe' to dfAllSeq in this comment? Just for consistency purposes.

Line 486: "closed" should be "closely"

Line 490: Change 'taxalist' to taxaListComplete in this comment?

Line 523: Weird, this link brings me to https://mran.microsoft.com/.

Line 597: "consist" should be "consistent"

Line 743: "intial" spelling error

Line 753: "pairingresults" to "dfPairingResultsL1L2" in this comment?

Line 889: "Fist" spelling error

Line 940: "an" to "a"

Line 1324: "realtiveDistOverall" spelling error

m-orton commented 7 years ago

Thanks very much for the edits Jacqueline!

-Yes to dfCentroid at line 360. -I left it as taxa intentionally at line 456 in case someone wanted to run a ref sequence for an order. (Leps for instance) -I think Sally decided all of the ref seqs would be the standard 658 bp so ill add a note at line 455. -I see what you mean about the link, they must have changed the link to that package so ill find the right link. -Yes to allseq to dfAllSeq, taxalist to taxaListComplete, pairingresults to dfPairingResultsL1L2.

sadamowi commented 7 years ago

Hi Jacqueline and Matt,

Thanks for your above work to refine the code and commenting.

Matt - Not all REF sequences are 658 bp. So, I suggest instead to add a note such as:

"We standardized the gene fragment analyzed to facilitate comparisons across taxa. We selected reference sequences that fell between the primers for COI by Folmer et al. (1994).

Folmer O, M, Black WH, Lutz R, Vrijenhoek R (1994) DNA primers for amplification of mitochondrial cytochrome C oxidase subunit I from metazoan invertebrates. Molecular Marine Biology and Biotechnology 3: 294-299.

In DNA barcoding studies, many taxonomic groups are amplified by these primers or by variants of these primers that bind at the same position. For most taxa, the reference sequences were exactly 658 bp in length. For certain taxa (e.g. Bivalvia), the length differed from 658 bp due to amino acid indels, but the same start and end point, as verified using an amino acid alignment, was used."

Best wishes, Sally

m-orton commented 7 years ago

Thanks Sally, i'll make sure to add that note to the code.

m-orton commented 7 years ago

I have now made all of these edits to the code and have also added Sally's note above about reference sequences to section 5 of the code. Permission to close issue?

sadamowi commented 7 years ago

All sounds good! Closing issue.