mgalardini / pyseer

SEER, reimplemented in python 🐍🔮
http://pyseer.readthedocs.io
Apache License 2.0
104 stars 25 forks source link

problem with analysis of COGs #135

Closed pavlo888 closed 3 years ago

pavlo888 commented 3 years ago

Hi,

I am trying to run the following argument

pyseer --phenotypes phenotype-list.pheno --pres pangenome-w-annot-ref/gene_presence_absence.Rtab --distances mash.tsv --save-m mash_mds --max-dimensions 4 > genomospecies9_COGs.txt

But then I get the following error: Read 42 phenotypes Detected binary phenotype Structure matrix has dimension (25, 25) Analysing 22 samples found in both phenotype and structure matrix Perfectly separable data error for null model Could not fit null model, exiting Do you have any idea of what could be the problem?

johnlees commented 3 years ago

A few thoughts: 1) This can happen when the phenotype is exactly correlated with the population structure - check and confirm whether that is the case. 2) 22 samples is very small for a GWAS, and it may not be possible to fit the model. It also looks like you don't have all of the 42 samples in the population structure matrix. 3) Try the LMM instead, as detailed in the best practices.

pavlo888 commented 3 years ago

Hi @johnlees

I think the issue might be that the phenotype is correlated with the population structure, since the phenotype for each genome is also their genomospecies identity. I was trying to replicate the analysis conducted by Gori et al 2020 https://mbio.asm.org/content/11/3/e00728-20/article-info who identified specific genes for each lineage of interest.

I assume then I cannot use pyseer for this end?

Cheers, Pablo

johnlees commented 3 years ago

Ah I see, in that case you might want to try with the --no-distances option and remove the structure:

pyseer --phenotypes phenotype-list.pheno --pres pangenome-w-annot-ref/gene_presence_absence.Rtab --no-distances > genomospecies9_COGs.txt

See also https://pyseer.readthedocs.io/en/master/usage.html?#no-population-structure-correction

pavlo888 commented 3 years ago

I seems to work but then I obtain a blank file as output. Is there anything else I could try? I have tried to follow the snps and k-mer tutorials but I also get errors on those

On Tue, Jan 26, 2021 at 2:24 PM John Lees notifications@github.com wrote:

Ah I see, in that case you might want to try with the --no-distances option and remove the structure:

pyseer --phenotypes phenotype-list.pheno --pres pangenome-w-annot-ref/gene_presence_absence.Rtab --no-distances > genomospecies9_COGs.txt

See also https://pyseer.readthedocs.io/en/master/usage.html?#no-population-structure-correction

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/mgalardini/pyseer/issues/135#issuecomment-767538387, or unsubscribe https://github.com/notifications/unsubscribe-auth/AK4VSEQUJHBTRVEU5P6AJMTS327BHANCNFSM4WTLPCBA .

johnlees commented 3 years ago

Could you paste the full command and output to the terminal here? Can you also double check that the sample names match in the phenotype file and COG file?

mgalardini commented 3 years ago

Closing for lack of follow-up messsages