nclark-lab / RERconverge

Analysis of convergence between organismal traits and DNA/protein sequences
GNU General Public License v3.0
44 stars 26 forks source link

mamRERw result is NA #91

Open cjl200095 opened 5 months ago

cjl200095 commented 5 months ago

Dear RERcoverge designer, I'm running into an error when I try to run RERconverge with my data. When I run the getAllResiduals with the built tree, the results are all NA. image I don‘t know how to fix this. Looking forward to your positive response.

nclark-lab commented 5 months ago

Could you provide your input trees file so we could diagnose the problem?

cjl200095 commented 5 months ago

Thanks for replying!

-----原始邮件----- 发件人:"Nathan Clark" @.> 发送时间:2024-04-06 07:22:17 (星期六) 收件人: nclark-lab/RERconverge @.> 抄送: cjl200095 @.>, Author @.> 主题: Re: [nclark-lab/RERconverge] mamRERw result is NA (Issue #91)

Could you provide your input trees file so we could diagnose the problem?

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

LisaSkalon commented 4 months ago

Hi @nclark-lab,

It appears that I'm encountering a similar issue to cjl200095. My dataset (phangorn2.txt) consists of 1069 genes and 30 species. While attempting to compute RER for all 30 species using getAllResiduals, only 169 genes have RER estimated (i=169, cutoff=3e-08). However, when I attempt to reduce the number of species for binary trait analysis (for example, using useSpecies=c("tuvinicus", "semicanus", "lemminus", "macrotis", "centralis", "glareolus", "rutilus"), the output table contains only NAs, and the counter i=.. is not printed. Could this be due to the branch lengths in my dataset being too small? Is there a way to resolve this and obtain a RER matrix for the reduced dataset?

Thank you in advance!

Lisa

phangorn2.txt

sorrywm commented 4 months ago

Hi Lisa!

I did not have any difficulties estimating RER from your full trees with the default settings. I suspect your output contains RERs for all your genes as well. Here is what I ran: ttrees <- readTrees(phangorn2.txt) trer <- getAllResiduals(ttrees) I then checked how many non-NA RER values there were in each row (for each gene) of the RER matrix: hist(rowSums(!is.na(trer))) egHistGithubIssue Clearly there are many genes with RERs calculated for many of the branches in the tree.*

At what stage (in which function) are you reducing your species set using useSpecies? I would not recommend reducing your species set this drastically, unless you only have phenotype data for a very small number of species. If you do need to restrict your species set, you will need to change the min.sp option in getAllCor from its default of 10. If you don't change this option, you will get mostly NAs because very few of your gene trees will have at least 10 branches.

Thank you for using RERconverge!

--Wynn

*To explain, in the output of getAllResiduals, the final i does not represent the number of genes for which RER are estimated. The program only prints an i for each separate set of species represented in the gene trees; all the gene trees with that species set are used to calculate RER for all of those genes in just one step, so the program does not print an i for each gene. The functions to correlate RER with phenotypes (e.g., correlateWithBinaryPhenotype) do not print any i, regardless of the number of genes.

LisaSkalon commented 4 months ago

Hi Wynn @sorrywm, thank you very much for your response!

I did not have any difficulties estimating RER from your full trees with the default settings. I suspect your output contains RERs for all your genes as well. To explain, in the output of getAllResiduals, the final i does not represent the number of genes for which RER are estimated.

Thank you for explaining that. Initially, I was confused by the behavior of the counter i in my data, since in the example dataset it ranged from 1 to 200 (max). Now I see that all my genes have their RER computed in the non-reduced dataset.

At what stage (in which function) are you reducing your species set using useSpecies? I would not recommend reducing your species set this drastically, unless you only have phenotype data for a very small number of species. If you do need to restrict your species set, you will need to change the min.sp option in getAllCor from its default of 10.

I was trying to reduce my species set in getAllResiduals since I wanted to compare only these 7 species in further binary trait analysis. They are phylogenetically close but have drastically different lifestyles, and I am interested in genes underlying these two phenotypes. Thanks to your answer, I was able to perform the analysis on this reduced set (I changed min.op in both getAllResiduals and correlateWithBinaryPhenotype). However, you mentioned that you don't recommend doing so. Is that because statistical analysis has reduced power on datasets with less than 10 species? Do the results of this analysis still make sense, or is it better to consider a more broadly distributed trait?

I really appreciate your help. Many thanks to the team for the tool and beautiful easy-to-follow tutorials!

Lisa

Wu-tz commented 2 months ago

Dear RERcoverge designer,

I'm encountering a similar issue when I used a dataset with nine species to run getAllResiduals. Is it more likely to cause NA when fewer species are used?

Here is my dataset all.tree.txt