Closed jmay29 closed 7 years ago
Hi Jacqueline,
I think that, in principle, this is a good idea, but in practice I fear many of our datasets would be too large. If you would like to try this, then I think this would have to be at the step after selecting a centroid sequence per BIN. Also, I think we would want to run this for multiple groups and then select the most common best model to apply to all groups. We would want to use the same model for all runs so that we can directly compare results across taxa. One potential option would be to run this on multiple groups (with subsetting first for very large groups) and then find the most common model.
However, if you can't solve the crash issue, then I'd suggest that this be left aside from the current project and potentially be a component that you consider incorporating for your MSc.
Best wishes, Sally
Will do - I will keep on testing it out in the code and let you know if any progress is made! :)
I'd like to suggest to close this issue. Jacqueline - Perhaps you might make a note of this separately as something to consider for your work. Certainly, this is something you would want to consider for your phylogenetic pipeline.
Best wishes, Sally
Sure thing! Thanks Sally
I was wondering if the function "modelTest" from the package "phangorn" was worth exploring for model selection in some parts of the pipeline. I had used it previously in a script to compare different models of DNA evolution for a phyDat object (a multiple sequence alignment, in our case) for building a ML tree. Here is a link to the documentation:
https://www.rdocumentation.org/packages/phangorn/versions/2.0.4/topics/modelTest
Although I just ran modelTest in RStudio on my data and it crashed ( 👎 ), so I will experiment a bit more with it!