hahnlab / CAFE5

Version 5 of the CAFE phylogenetics software
Other
109 stars 22 forks source link

outlier gene families #127

Closed marcelauliano closed 1 year ago

marcelauliano commented 1 year ago

Hi all, Thanks a lot for CAFE5. Let me ask you guidance here:

I'm having the problem of CAFE not finishing a run because some outlier families. I have 39 mammal orthology with orthofinder2 - but some of the genomes were illumina and prediction might be very fragmented. Then CAFE gives me a warning saying I should remove some large differential gene families. I do that, can finish a run and get a lambda, then I try to run CAFE again with the large gene families and using that lambda, but the run gives me the error again. Then I try to run it with a very loooow lambda, but it also doesn't finish. It seems to me I cannot finish a CAFE run with -k values without removing certain gene families. It only finishes in the default mode, which states I have no among family rate variation. I don't think this is ideal.

Any idea where I should go from here? Thank you so much!

hahnlab-user commented 1 year ago

Hi,

First, can you clarify: how many -k values are you trying to fit? Do these fit on the dataset when large families are removed? Does CAFE run fine when k=1, even with the large families?

Second, unfortunately, if one has to remove extreme families, there is no obvious way to include them again. They (likely) simply have very little phylogenetic information contained within them.

Matt

marcelauliano commented 1 year ago

Hi Matt, Thanks a lot for your reply. Yes, the run finishes if I give -k 1. It does not finish with any larger k (2, 3, 4...). I also have ran a Base error Model (maxcnt: 199 cntdiff: -1 0 1 0 0 0.995281 0.00471885 1 0.00471885 0.990562 0.00471885)

And my base model result (or -k1): Model Base Result: 453192 Lambda: 0.0033270360959984 Epsilon: 0.00471885

The instruction I was mentioning earlier is the 3.1.2 here https://github.com/hahnlab/CAFE5/blob/master/docs/tutorial/tutorial.md I can either run cafe on the base model or remove the large gene families. What I find strange though is that it keeps asking me to remove yet other gene families on the next run, so at the end I don't know how reliable that analyses will be. I guess I should maybe investigate what those families are. Thank you Matt!

github-actions[bot] commented 1 year ago

This issue is stale because it has been open for 90 days with no activity.