mgalardini / pyseer

SEER, reimplemented in python 🐍🔮
http://pyseer.readthedocs.io
Apache License 2.0
108 stars 25 forks source link

SNP and COG association with fixed effects model #271

Open OtakuZerg opened 1 month ago

OtakuZerg commented 1 month ago

Hi, everyone

I encounter an issue when I run with my laptop (macbook pro, M3 max, using iterm with rosetta x86 simulation) SNP and COG association with fixed effects model :

pyseer --phenotypes a.pheno --vcf change.vcf.gz --load-m a_mds.pkl --lineage --print-samples --cpu 8 > SNPs_amox_all.txt

it showed a lot of logs like: error: /Users/Ycchen/miniforge3/envs/Pyseer/lib/python3.10/site-packages/statsmodels/discrete/discrete_model.py:2385: RuntimeWarning: overflow encountered in exp return 1/(1+np.exp(-X)) /Users/Ycchen/miniforge3/envs/Pyseer/lib/python3.10/site-packages/statsmodels/discrete/discrete_model.py:2443: RuntimeWarning: divide by zero encountered in log return np.sum(np.log(self.cdf(q * linpred)))

does anyone ever encounter such an issue ? many thanks

johnlees commented 1 month ago

Try without --lineage perhaps, and I would also recommend using the LMM (https://pyseer.readthedocs.io/en/latest/best_practices.html)

OtakuZerg commented 1 month ago

@johnlees Thank you for your assistance. I am curious about the technical issue. Could this error be related to my data format, my Conda environment, or a Python module? I’ve noticed that my data format appears slightly different: in practice (pyseer tutorial), image However, in my vcf.gz file, it looks like this: image

johnlees commented 1 month ago

That looks ok to me, the likelihoods in your file should be ignored. To debug further we would need you to make a minimal reproducible example, including input files and ideally with a single variant which triggers this error.

OtakuZerg commented 1 month ago

@johnlees Thank you for immediate reply. I will try to adjust my input and see how it goes.