mgalardini / pyseer

SEER, reimplemented in python 🐍🔮
http://pyseer.readthedocs.io
Apache License 2.0
110 stars 27 forks source link

Error when running predictions #114

Closed tkarasov closed 4 years ago

tkarasov commented 4 years ago

Hi, Thanks for the great tutorials and software. I'm attempting a trial run for predictions with my own data. I have 92 samples, so split my data into 46 for training and 46 for testing. Performing the initial elastic net regression goes fine, but when I attempt to fit the model to the training set, I seem to be having a problem. My enet model with full data was built with the following command:

pyseer --phenotypes ./input/strain_phenot.txt --kmers ./input/input_genomes/unitigs_output/unitigs.txt.gz  --wg enet \
 --save-vars ./output/ma_snps --save-model ./output/unitig.lasso --cpu 4 --alpha 1  > ./output/selected.txt

When I call the following for 1/2 of the phenotypes/genetoypes I get the error:

pyseer --kmers ./input/input_genomes/unitigs_output/unitigs.txt.gz \
 --phenotypes ./input/train.pheno --wg enet \
--load-vars ./output/ma_snps --alpha 1 --save-model ./output/test_lasso --cpu 4 

I get the following output:

Read 46 phenotypes
Detected continuous phenotype
Reading all variants
Analysing 46 samples found in both phenotype and loaded npy
Applying correlation filtering
  0%|                                                                                                     | 0/2695274 [00:00<?, ?variants/s]
Traceback (most recent call last):
  File "/ebio/abt6_projects9/metagenomic_controlled/Programs/anaconda3/envs/mapping/bin/pyseer", line 10, in <module>
    sys.exit(main())
  File "/ebio/abt6_projects9/metagenomic_controlled/Programs/anaconda3/envs/mapping/lib/python3.7/site-packages/pyseer/__main__.py", line 605, in main
    cor_filter = correlation_filter(p, all_vars, options.cor_filter)
  File "/ebio/abt6_projects9/metagenomic_controlled/Programs/anaconda3/envs/mapping/lib/python3.7/site-packages/pyseer/enet.py", line 348, in correlation_filter
    sum_a_squared = k.dot(k.transpose()).data[0] - 2*k_mean*csr_matrix.sum(k) + pow(k_mean, 2) * all_vars.shape[1]
IndexError: index 0 is out of bounds for axis 0 with size 0
mgalardini commented 4 years ago

Hi, thanks for reporting this issue; let me have a look into it, I believe it may be related to some other problems we are having with installing and unit testing. I hope to be able to get back to you during this week.