m-orton / Evolutionary-Rates-Analysis-Pipeline

The purpose of this repository is to develop software pipelines in R that can perform large scale phylogenetics comparisons of various taxa found on the Barcode of Life Database (BOLD) API.
GNU General Public License v3.0
7 stars 1 forks source link

Median latitude #36

Closed jmay29 closed 7 years ago

jmay29 commented 7 years ago

I was wondering if median latitude in this pipeline could be determined prior to running some of the filters (i.e. removing sequences with large N/gap content, short/long sequences)? This could increase the amount of data that is used to determine the median latitude of each BIN. I think you and I discussed this just prior to break, Sally, as I have it on my "to-do" list!

m-orton commented 7 years ago

No problem, I'll modify the script so that median lat is determined before these filtering steps (N/gap content, short/long seqs).

I think it would be good if we could implement this for the remaining taxa that havent been run yet.

m-orton commented 7 years ago

Hey Jacqueline,

I modified the Arthropoda branch so that medianLat and Lon are determined before filtering of N/gap and short/long seqs. I also thought it made sense to do the lat min and max before this step as well. When you run through fish again, you can try using the Arthopoda branch as a template though you will have to reinsert your reference sequence. Hope this helps.

Matt

jmay29 commented 7 years ago

Awesome! Thanks Matt. I'll post my results to the Dropbox this afternoon.

sadamowi commented 7 years ago

Hi Jacqueline and Matt,

I think that this reordering of the filtering steps makes sense. All taxa with a BIN assignment meet certain sequence length (>500 bp) and quality criteria (<1% Ns), and so I think that all sequences with a BIN can be used for determining the geographic info. (I do suggest still to first delete any specimen records that entirely lack a sequence, even if it bears a BIN. If a specimen got a BIN but then the sequence was deleted later, that would likely be due to the detection of an error.)

However, I would discourage making this change in some branches but not others. That would complicate the manuscript, and I don't think we have a biologically justifiable reason for a difference among branches (unlike a couple of the other edits we've made to specific branches, such as for the mollusc alignment settings). Also, I think breaking up specific taxa by geographic region was necessary and justifiable.

For this change to the ordering of filtering being discussed, I suggest:

  1. do this for everything.

or

  1. leave this aside as a future improvement. Perhaps Jacqueline could incorporate that change into her thesis version and see if it makes a difference.

Best wishes, Sally

m-orton commented 7 years ago

I think probably it would be good to leave this aside as a future improvement to the script to avoid rerunning previously completed taxa. Im thinking what I could do instead is make a separate version of the script with this change that Jacqueline could use and revert the Arthropoda branch back to its previous state so branches are consistent. Would this be ok with everyone?

jmay29 commented 7 years ago

Ok! No worries, you don't have to make another version for me! :P I already have the reordered one saved to my desktop. But yeah, I guess it makes sense since I lot of the results are already finished!!

jmay29 commented 7 years ago

Also....this might be unnecessary - but should I think about breaking fish taxa up by geographic region to see how the results might differ between regions? Or I can try another group of organisms, whichever you guys prefer.

sadamowi commented 7 years ago

Hi Jacqueline and Matt,

I think that is a wise choice to leave that improvement aside for now, and Jacqueline could potentially explore shifting the filtering order for the more intensive look at molecular patterns in fish.

Projects always build one upon another, and it is important to draw the line somewhere and give a particular project a pausing/writing up/dissemination point. It is good, especially for students and recent grads(!) but for any researcher, to prepare a concrete outcome once a reasonable piece of work has been achieved and not go on indefinitely. I suspect this change would make a modest difference at most, although it is most certainly a reasonable change to make for a future improvement.

Jacqueline - You also asked about breaking up fish geographically. I think that's something you could try, if you wish, for your fish-focused project. Here, I think that wouldn't be that helpful for informing us about the best divisions for Lepidoptera and Hymenoptera (given that you have both freshwater and marine representatives and dispersal is very different between fish and these insects). But this could be interesting for your project. However, rather than breaking up the fish prior to running them (unless it is essential to get the alignment to run), I suggest instead to check after running the pairs whether or not different regions give a different pattern. Another cool thing to check would be whether results are different if you compare the results for these different groupings of pairs:

-tropical vs. temperate -temperate vs. polar -tropical vs. polar

Do the pairs with the largest differences in latitude exhibit a different trend compared to the overall?

Best wishes, Sally

sadamowi commented 7 years ago

PS. I suggest that we close this issue for this present project. Jacqueline - perhaps you might start a new thread collating ideas for potential future improvements to the pipeline and ideas you might consider for your MSc project.

Best wishes, Sally

jmay29 commented 7 years ago

Sure! Thanks Sally. Closing now!