Open BenKuhnhaeuser opened 3 years ago
Ben, Thanks for the suggestion. We need to think a bit more about how this would be best done. In the meantime, I suggest you explore -b
option to see if over-filtering is fixed. If you increase -b to, say, 10 or 20 (instead of default 5) you would remove fewer things.
Hi Uyen,
Many thanks again for implementing the "whitelisting" option! I would like to ask whether it would also be possible to implement an option to partition TreeShrink analyses by clade. In my case, I am working with a large palm subfamily (ca. 550 spp), with three tribes that contain approximately 5, 50 and 500 species each. From my experience with almost a thousand gene trees for this clade, generally TreeShrink works very well. But in some cases, all species from the smaller tribes are flagged by TreeShrink, and quite often (in ca. 5% of the cases), all outgroup taxa too. I have checked all alignments manually and can confirm that these sequences are perfectly fine. Would it be possible to implement an option that partitions gene trees and alignments by customer-defined clades (maybe similar to how it is done in DiscoVista), runs the analyses only on these more homogeneous, and then applies the results to the original (un-partitioned) gene trees and alignments? I believe that would be a good way to avoid unwanted removal of divergent clades. Maybe for a future version of TreeShrink?
Best, Ben