kr-colab / diploSHIC

feature-based deep learning for the identification of selective sweeps
MIT License
50 stars 14 forks source link

error when using fvecVcf #26

Closed ghost closed 4 years ago

ghost commented 4 years ago

I'm interested in training diploSHIC using a set of simulated vcfs that I have generated. As a first step I wanted to convert the vcfs to feature vectors using fvecVcf. Below is the command line I'm using, attached is the vcf I'm applying the command to 1_recap_human_demo_neutral_p1_slim.vcf.zip : python diploSHIC.py fvecVcf diploid 1_recap_human_demo_neutral_p1_slim.vcf 8 100000 1_recap_human_demo_neutral_p1.fvec --numSubWins 11

I get a long error directing me to a TypeError in allel.util. I've tried a few variations on this command and wound up with similar errors

andrewkern commented 4 years ago

Okay it looks like the issues are with the command line that you are giving above.

  1. The chromosome name you are giving, 8, is not in the .vcf file.
  2. Also the chromosome length, which you are specifying as 100000 but the vcf file says the chrom length is 1000001

using the following command line works on my system

% python diploSHIC.py fvecVcf diploid 1_recap_human_demo_neutral_p1_slim.vcf 1 1000001 1_recap_human_demo_neutral_p1.fvec --numSubWins 11
/Users/adk/miniconda3/envs/diploshic/bin/python makeFeatureVecsForChrArmFromVcfDiploid.py 1_recap_human_demo_neutral_p1_slim.vcf 1 1000001 None 1100000 11 None 0.25 0.75 None None 1_recap_human_demo_neutral_p1.fvec
Warning: a mask.fa file for the chr arm with all masked sites N'ed out is strongly recommended (pass in the reference to remove Ns at the very least)!
makeFeatureVecsForChrArmFromVcfDiploid.py:123: DeprecationWarning: time.clock has been deprecated in Python 3.3 and will be removed from Python 3.8: use time.perf_counter or time.process_time instead
  startTime = time.clock()
1-100000 num unmasked snps: 361; unmasked frac: 0.998540
100001-200000 num unmasked snps: 290; unmasked frac: 0.998520
200001-300000 num unmasked snps: 408; unmasked frac: 0.998850
300001-400000 num unmasked snps: 362; unmasked frac: 0.998450
400001-500000 num unmasked snps: 305; unmasked frac: 0.998580
500001-600000 num unmasked snps: 400; unmasked frac: 0.998550
600001-700000 num unmasked snps: 334; unmasked frac: 0.998510
700001-800000 num unmasked snps: 232; unmasked frac: 0.998480
800001-900000 num unmasked snps: 358; unmasked frac: 0.999000
900001-1000000 num unmasked snps: 337; unmasked frac: 0.998700
completed in 1.33844 seconds

@kahlquist-brown can you confirm for me that this works for you?

thanks

ghost commented 4 years ago

This works! Thanks for your help