##fileformat=VCFv4.2
##nanopolish_window=MN908947.3:1-29902
##INFO=<ID=TotalReads,Number=1,Type=Integer,Description="The number of event-space reads used to call the variant">
##INFO=<ID=SupportFraction,Number=1,Type=Float,Description="The fraction of event-space reads that support the variant">
##INFO=<ID=SupportFractionByStrand,Number=2,Type=Float,Description="Fraction of event-space reads that support the variant for each strand">
##INFO=<ID=BaseCalledReadsWithVariant,Number=1,Type=Integer,Description="The number of base-space reads that support the variant">
##INFO=<ID=BaseCalledFraction,Number=1,Type=Float,Description="The fraction of base-space reads that support the variant">
##INFO=<ID=AlleleCount,Number=1,Type=Integer,Description="The inferred number of copies of the allele">
##INFO=<ID=StrandSupport,Number=4,Type=Integer,Description="Number of reads supporting the REF and ALT allele, by strand">
##INFO=<ID=StrandFisherTest,Number=1,Type=Integer,Description="Strand bias fisher test">
##INFO=<ID=SOR,Number=1,Type=Float,Description="StrandOddsRatio test from GATK">
##INFO=<ID=RefContext,Number=1,Type=String,Description="The reference sequence context surrounding the variant call">
##INFO=<ID=Pool,Number=1,Type=String,Description="The pool name">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
This file will be detected as a bed file, as it does not contain lines.
case: gzipped binary files
$ tataki tiny.bam.gz --yaml -v
[2024-07-11T07:26:40Z INFO tataki::module] tataki started
[2024-07-11T07:26:40Z DEBUG tataki::module] Args: Args { input: ["tiny.bam.gz"], output: None, output_format: Csv, yaml: true, cache_dir: None, conf: None, tidy: false, no_decompress: false, num_records: 100000, dry_run: false, verbose: true, quiet: false }
[2024-07-11T07:26:40Z DEBUG tataki::module] Output format: Yaml
[2024-07-11T07:26:40Z INFO tataki::module] Created temporary directory: /tmp/tataki_2024-0711-162640_BgiSCI
[2024-07-11T07:26:40Z INFO tataki::module] Processing input: tiny.bam.gz
[2024-07-11T07:26:40Z DEBUG tataki::source] Provided input is in GZ format
Error: stream did not contain valid UTF-8
The file is gzipped, but the tataki (specifically the internal Rust GZ decoder) expects a flat file out from it.
case: BGZF
tataki SAMPLE_01.pass.vcf.gz --yaml
[2024-07-11T07:36:49Z INFO tataki::module] tataki started
[2024-07-11T07:36:49Z INFO tataki::module] Created temporary directory: /tmp/tataki_2024-0711-163649_HL6qcv
[2024-07-11T07:36:49Z INFO tataki::module] Processing input: SAMPLE_01.pass.vcf.gz
[2024-07-11T07:36:49Z INFO tataki::parser] Invoking parser empty
[2024-07-11T07:36:49Z INFO tataki::parser] Invoking parser bam
[2024-07-11T07:36:49Z INFO tataki::parser] Invoking parser bcf
[2024-07-11T07:36:49Z INFO tataki::parser] Invoking parser bed
[2024-07-11T07:36:49Z INFO tataki::parser] Invoking parser cram
[2024-07-11T07:36:49Z INFO tataki::parser] Invoking parser fasta
[2024-07-11T07:36:49Z INFO tataki::parser] Invoking parser fastq
[2024-07-11T07:36:49Z INFO tataki::parser] Invoking parser gff3
[2024-07-11T07:36:49Z INFO tataki::parser] Invoking parser gtf
[2024-07-11T07:36:49Z INFO tataki::parser] Invoking parser sam
[2024-07-11T07:36:49Z INFO tataki::parser] Invoking parser vcf
[2024-07-11T07:36:49Z INFO tataki::module] Detected!! vcf
[2024-07-11T07:36:49Z INFO tataki::module] Deleting temporary directory: /tmp/tataki_2024-0711-163649_HL6qcv
SAMPLE_01.pass.vcf.gz:
id: http://edamontology.org/format_3016
label: VCF
decompressed:
label: null
id: null
The file SAMPLE_01.pass.vcf.gz looks like a normal GZIP file, but it is a BGZF (Blocked GNU Zip Format) file. As it has a header which shows the file inside is VCF, tataki tells that it is a normal VCF file.
case: a file only with header lines
This file will be detected as a
bed
file, as it does not contain lines.case: gzipped binary files
The file is gzipped, but the tataki (specifically the internal Rust GZ decoder) expects a flat file out from it.
case: BGZF
The file
SAMPLE_01.pass.vcf.gz
looks like a normal GZIP file, but it is a BGZF (Blocked GNU Zip Format) file. As it has a header which shows the file inside is VCF, tataki tells that it is a normal VCF file.