GCF_008632635.1_ASM863263v1_genomic.fna.gz is bgzipped Acinetobacter baumanii reference sequence from ncbi, fastq directory contains simmulated reads generated from this reference (I have double checked - the reads map to the reference with NCBI blastn)
result - vcf files that contain header, but no variants.
version: I've used docker image ontresearch/medaka:v1.11.3 (but tried several other tags and unofficial images with the same result), no GPU support
medaka_haploid_variant -i /keep/c9d4a8f0579887f2060ac1ea5673065a+386/fastq -r /var/spool/cwl/c9d4a8f0579887f2060ac1ea5673065a+386/GCF_008632635.1_ASM863263v1_genomic.fna.gz
Cannot import pyabpoa, some features may not be available.
Cannot import pyabpoa, some features may not be available.
Cannot import pyabpoa, some features may not be available.
[10:24:22 - MdlStrTF] Successfully removed temporary files from /tmp/tmp3l6dxx7m.
Cannot import pyabpoa, some features may not be available.
Cannot import pyabpoa, some features may not be available.
[10:24:24 - MdlStrTF] Successfully removed temporary files from /tmp/tmps2xs_fml.
Creating fai index file /var/spool/cwl/c9d4a8f0579887f2060ac1ea5673065a+386/GCF_008632635.1_ASM863263v1_genomic.fna.gz.fai
Creating mmi index file /var/spool/cwl/c9d4a8f0579887f2060ac1ea5673065a+386/GCF_008632635.1_ASM863263v1_genomic.fna.gz.map-ont.mmi
[M::mm_idx_gen::0.143*1.00] collected minimizers
[M::mm_idx_gen::0.203*0.99] sorted minimizers
[M::main::0.239*0.99] loaded/built the index for 2 target sequence(s)
[M::mm_idx_stat] kmer size: 15; skip: 10; is_hpc: 0; #seq: 2
[M::mm_idx_stat::0.248*0.99] distinct minimizers: 722157 (98.17% are singletons); average occurrences: 1.031; average spacing: 5.348; total length: 3980230
[M::main] Version: 2.26-r1175
[M::main] CMD: minimap2 -I 16G -x map-ont -d /var/spool/cwl/c9d4a8f0579887f2060ac1ea5673065a+386/GCF_008632635.1_ASM863263v1_genomic.fna.gz.map-ont.mmi /var/spool/cwl/c9d4a8f0579887f2060ac1ea5673065a+386/GCF_008632635.1_ASM863263v1_genomic.fna.gz
[M::main] Real time: 0.256 sec; CPU: 0.254 sec; Peak RSS: 0.041 GB
[M::main::0.046*0.95] loaded/built the index for 2 target sequence(s)
[M::mm_mapopt_update::0.057*0.96] mid_occ = 12
[M::mm_idx_stat] kmer size: 15; skip: 10; is_hpc: 0; #seq: 2
[M::mm_idx_stat::0.066*0.96] distinct minimizers: 722157 (98.17% are singletons); average occurrences: 1.031; average spacing: 5.348; total length: 3980230
[M::main] Version: 2.26-r1175
[M::main] CMD: minimap2 -x map-ont --secondary=no -L --MD -A 2 -B 4 -O 4,24 -E 2,1 -t 1 -a /var/spool/cwl/c9d4a8f0579887f2060ac1ea5673065a+386/GCF_008632635.1_ASM863263v1_genomic.fna.gz.map-ont.mmi /keep/c9d4a8f0579887f2060ac1ea5673065a+386/fastq
[M::main] Real time: 0.071 sec; CPU: 0.068 sec; Peak RSS: 0.028 GB
Cannot import pyabpoa, some features may not be available.
[10:24:34 - Predict] Setting tensorflow inter/intra-op threads to 1/1.
[10:24:34 - Predict] Processing region(s): NZ_CP043953.1:0-3972439 NZ_CP043954.1:0-7791
[10:24:34 - Predict] Using model: /home/epi2melabs/conda/lib/python3.8/site-packages/medaka/data/r1041_e82_400bps_sup_variant_v4.3.0_model.tar.gz.
[10:24:34 - BAMFile] Creating pool of 16 BAM file sets.
[10:24:34 - Predict] Processing 4 long region(s) with batching.
[10:24:35 - MdlStrTF] Model <keras.engine.sequential.Sequential object at 0x146f2b44ae50>
[10:24:35 - MdlStrTF] loading weights from /tmp/tmplkbiynkf/model/variables/variables (using expect partial)
[10:24:35 - Sampler] Initializing sampler for consensus of region NZ_CP043953.1:0-1000000.
[10:24:35 - Sampler] Initializing sampler for consensus of region NZ_CP043953.1:999000-1999000.
[10:24:35 - PWorker] Running inference for 4.0M draft bases.
[10:24:35 - Sampler] Took 0.14s to make features.
[10:24:35 - Sampler] Initializing sampler for consensus of region NZ_CP043953.1:1998000-2998000.
[10:24:35 - Sampler] Took 0.16s to make features.
[10:24:35 - Sampler] Initializing sampler for consensus of region NZ_CP043953.1:2997000-3972439.
[10:24:35 - Sampler] Took 0.20s to make features.
[10:24:35 - Sampler] Took 0.19s to make features.
[10:24:35 - PWorker] Processed 0 batches
[10:24:35 - PWorker] All done, 0 remainder regions.
[10:24:35 - Predict] Processing 1 short region(s).
[10:24:36 - MdlStrTF] Model <keras.engine.sequential.Sequential object at 0x146f2ae18ee0>
[10:24:36 - MdlStrTF] loading weights from /tmp/tmplkbiynkf/model/variables/variables (using expect partial)
[10:24:36 - Sampler] Initializing sampler for consensus of region NZ_CP043954.1:0-7791.
[10:24:36 - Sampler] Took 0.00s to make features.
[10:24:36 - PWorker] Running inference for 0.0M draft bases.
[10:24:36 - PWorker] Processed 0 batches
[10:24:36 - PWorker] All done, 0 remainder regions.
[10:24:36 - Predict] Finished processing all regions.
[10:24:36 - MdlStrTF] Successfully removed temporary files from /tmp/tmplkbiynkf.
Cannot import pyabpoa, some features may not be available.
[10:24:37 - DataIndx] Loaded 1/1 (100.00%) sample files.
Writing to /tmp/bcftools.lBEdxX
Merging 0 temporary files
Cleaning
Done
Cannot import pyabpoa, some features may not be available.
[10:24:38 - Annotate] Getting chrom coordinates
Thought that it may be caused by lacking pyabpoa package, so I created my own docker image that returned the same vcfs.
Additional context - I know that bam files are also empty:
-rw-r--r-- 1 crunchg dialout 418 Feb 8 10:24 calls_to_ref.bam
-rw-r--r-- 1 crunchg dialout 32 Feb 8 10:24 calls_to_ref.bam.bai
-rw-r--r-- 1 crunchg dialout 6.2K Feb 8 10:24 consensus_probs.hdf
-rw-r--r-- 1 crunchg dialout 1.4K Feb 8 10:24 medaka.annotated.vcf
-rw-r--r-- 1 crunchg dialout 302 Feb 8 10:24 medaka.sorted.vcf
-rw-r--r-- 1 crunchg dialout 250 Feb 8 10:24 medaka.vcf
medaka.annotated.vcf content:
##fileformat=VCFv4.1
##medaka_version=1.11.3
##FILTER=<ID=PASS,Description="All filters passed">
##medaka_version=1.11.3
##FORMAT=<ID=GT,Number=1,Type=String,Description="Medaka genotype.">
##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Medaka genotype quality score">
##INFO=<ID=DP,Number=1,Type=Integer,Description="Depth of reads at position, calculated from read pileup, capped to ~8000.">
##INFO=<ID=DPS,Number=2,Type=Integer,Description="Depth of reads at position by strand (fwd, rev), calculated from read pileup, capped to ~8000 total.">
##INFO=<ID=DPSP,Number=1,Type=Integer,Description="Depth of reads spanning pos +-25. This is not capped as in the case of DP and DPS.">
##INFO=<ID=SR,Number=.,Type=Integer,Description="Depth of spanning reads by strand which best align to each allele (ref fwd, ref rev, alt1 fwd, alt1 rev, etc.). This is not capped as in the case of DP and DPS.">
##INFO=<ID=AR,Number=2,Type=Integer,Description="Depth of ambiguous spanning reads by strand which align equally well to all alleles (fwd, rev). This is not capped as in the case of DP and DPS.">
##INFO=<ID=SC,Number=.,Type=Integer,Description="Total alignment score to each allele of spanning reads by strand (ref fwd, ref rev, alt1 fwd, alt1 rev, etc.) aligned with parasail: match 5, mismatch -4, open 5, extend 3">
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT SAMPLE
Command run:
GCF_008632635.1_ASM863263v1_genomic.fna.gz
is bgzipped Acinetobacter baumanii reference sequence from ncbi,fastq
directory contains simmulated reads generated from this reference (I have double checked - the reads map to the reference with NCBI blastn)result - vcf files that contain header, but no variants.
version: I've used docker image
ontresearch/medaka:v1.11.3
(but tried several other tags and unofficial images with the same result), no GPU supportThought that it may be caused by lacking
pyabpoa
package, so I created my own docker image that returned the same vcfs.Additional context - I know that
bam
files are also empty:medaka.annotated.vcf
content: