Closed GaloGS closed 9 months ago
Thanks for reporting this crash; would you mind ending over your inputs so that I can try to see what is going on? You can also send the data privately to my email address.
Thank you very much for your prompt response and your help! I have sent you an e-mail with all the files.
Thanks for the files: with pyseer 1.3.11 I get the following output (variants omitted just in case):
$ pyseer --vcf file.vcf.gz --phenotypes pheno.txt --wg enet --save-vars ma_snps --save-model model.lasso --min-af 0.9 --alpha 1
Read 1540 phenotypes
Detected binary phenotype
[E::idx_find_and_load] Could not retrieve index file for file.vcf.gz'
Reading all variants
4309variants [00:10, 426.50variants/s]
Saved enet variants as ma_snps.pkl
Applying correlation filtering
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 24/24 [00:00<00:00, 1534.97variants/s]
Fitting elastic net to top 18 variants
Best penalty (lambda) from cross-validation: 7.60E-03
Best model deviance from cross-validation: 0.995 ± 2.50E-02
Best R^2 from cross-validation: -0.246
Finding and printing selected variants
[E::idx_find_and_load] Could not retrieve index file for 'file.vcf.gz'
variant af filter-pvalue lrt-pvalue beta notes
[...]
Saved enet model as model.lasso.pkl
4309 loaded variants
4291 pre-filtered variants
18 tested variants
6 printed variants
I had to change your phenotype file because it had withespace as a delimiter; a tab character is needed.
Can you try again with the latest version of pyseer and with an updated phenotype file?
Thanks
Dear Marco,
Sorry for the stupid mistake. Given the error message I did not realize that the problem could be in the phenotype file. I have substituted spaces by tabs and the program runs without problems.
Thanks you very much for your help!
Galo
Dear pyseer developers,
Thanks a lot for this awesome tool, I am thrilled to use it with our dataset, but I am getting an error that I tried to fix without success.
I have a dataset of around 1500 samples. I have created the merged/compressed/indexed VCF using BCFtools as explained in the tutorials. I have also filtered it, so only variants observed in at least 10 samples at >90% allele frequency are included. The VCF seems correct, and the sample names in it, as well as in the phenotype file, do match.
However when I run the following command:
~/.local/bin/pyseer --vcf dataset.merged.filtered.vcf.gz --phenotypes pheno.txt --wg enet --save-vars ma_snps --save-model model.lasso --min-af 0.9 --alpha 1 > selected.txt
I get the following error:
I tried other combinations, and also the argument
--max-missing 0.999
because I thought that maybe I had too many singletons, but still I have this problem.Do you have any idea of why this may be happening?
Thank you very much, Galo