Closed batelz closed 1 year ago
It seems like the output hdf5 file is never produced, and then the last steps (adding map position and allele frequencies to this file) fail because the program cannot find this file.
So the error occurs before that - either when producing the 1240k-filtered VCF or the initial output hdf5 file.
Is the intermediate 1240k VCF produced? It should be at f"./data/vcf.1240k/example_hazelton_chr{ch}.vcf.gz"
The file is there, but it's empty:
(base):~/ancIBD/data/vcf.1240k$ bcftools view example_hazelton_chr22.vcf.gz
##fileformat=VCFv4.2
##FILTER=<ID=PASS,Description="All filters passed">
##fileDate=01/08/2023 - 11:43:29
##source=GLIMPSE_phase v2.0.0
##contig=<ID=chr22>
##INFO=<ID=RAF,Number=A,Type=Float,Description="ALT allele frequency in the reference panel">
##INFO=<ID=AF,Number=A,Type=Float,Description="ALT allele frequency computed from DS/GP field across target samples">
##INFO=<ID=INFO,Number=A,Type=Float,Description="IMPUTE info quality score for diploid samples">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Phased genotypes">
##FORMAT=<ID=GP,Number=3,Type=Float,Description="Genotype posteriors">
##NMAIN=15
##FPLOIDY=2
##bcftools_mergeVersion=1.10.2+htslib-1.10.2-3ubuntu0.1
##bcftools_mergeCommand=merge -l GLIMPSE_ligate/samples_merge.txt -o merged.vcf.gz; Date=Tue Aug 1 13:03:00 2023
##bcftools_annotateVersion=1.10.2+htslib-1.10.2-3ubuntu0.1
##bcftools_annotateCommand=annotate -x ^FORMAT/GT,FORMAT/GP -o GLIMPSE_ligate/merged_chr22_GT_GP.vcf.gz merged.vcf.gz; Date=Tue Aug 1 13:04:43 2023
##bcftools_viewVersion=1.10.2+htslib-1.10.2-3ubuntu0.1
##bcftools_viewCommand=view -Oz -o ./data/vcf.1240k/example_hazelton_chr22.vcf.gz -T ./data/filters/snps_bcftools_ch22.csv -M2 -v snps ./data/vcf.raw/merged_chr22_GT_GP.vcf.gz; Date=Tue Aug 1 13:28:53 2023
##bcftools_viewCommand=view example_hazelton_chr22.vcf.gz; Date=Mon Aug 21 13:20:39 2023
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT S10769.SN S10770.SN
Then I would recommend manually running the bcftools command that creates that intermediary 1240k VCF - and trouble-shoot what goes wrong (nothing matches the SNP filter is the obvious first thing to look into).
Indeed, I see from the above that you have chr22
notation, but the filter file has 22
only (without the chr
). You should transform your VCF accordingly to match the latter notation.
Hi again. This issue can be merged with #5.
I've used GLIMPSE to impute two samples, and resulted with the following vcf file:
When trying to convert to HD5 using
I get the following error:
The directory does exist, as I get no error when running:
os.listdir("./data/hdf5/")
and I've also tried writing the absolute path.Moreover, when running the example you provided -- just by switching the
in_vcf
it runs perfectly fine. So it must be something in the vcf file iteslf.Any ideas?
Batel