jodyphelan / TBProfiler

Profiling tool for Mycobacterium tuberculosis to detect ressistance and strain type from WGS data
GNU General Public License v3.0
102 stars 42 forks source link

[E::bcf_hdr_read] Input is not detected as bcf or vcf format #367

Closed schorlton-bugseq closed 4 hours ago

schorlton-bugseq commented 1 month ago

Thanks for all of your work on TBProfiler. Running;

tb-profiler profile --bam rpoB.bam --external_db tbdb/tbdb

This worked on v5, but now on v6.2.1 (installed from conda) getting the below error. My external_db as created with update_db and --match_ref, using the RefSeq H37Rv (NC_000962.3). As it cleans up (and I can't figure out if one can disable this?), I can't inspect the VCFs. Any ideas?

Thanks!

# tb-profiler error report

* OS: linux
* Program version: 6.2.1
* Database version: {'name': 'tbdb', 'commit': 'e6a0040', 'Merge': 'e3c6ce6 539a394', 'Author': 'Jody Phelan <jody.phelan@lshtm.ac.uk>', 'Date': 'Thu Mar 21 14:45:00 2024 +0000'}
* Program call:
\```
{'logging': 'DEBUG', 'read1': None, 'read2': None, 'bam': 'rpoB.bam', 'fasta': None, 'vcf': None, 'platform': 'illumina', 'db': 'tbdb', 'external_db': '/test/tbdb/tbdb', 'prefix': 'tbprofiler', 'csv': False, 'txt': False, 'text_template': None, 'docx': False, 'docx_template': None, 'add_columns': None, 'add_mutation_metadata': False, 'dir': '/test', 'depth': '0,10', 'af': '0,0.1', 'strand': '0,3', 'sv_depth': '0,10', 'sv_af': '0.5,0.9', 'sv_len': '100000,50000', 'mapper': 'bwa', 'caller': 'freebayes', 'calling_params': None, 'kmer_counter': 'kmc', 'coverage_tool': 'samtools', 'suspect': False, 'spoligotype': False, 'update_phylo': False, 'call_whole_genome': False, 'snp_dist': None, 'snp_diff_db': None, 'snp_diff_no_store': False, 'no_trim': False, 'no_coverage_qc': False, 'no_samclip': False, 'no_delly': False, 'no_lineage': False, 'add_variant_annotations': False, 'threads': 8, 'ram': 2, 'delly_vcf': None, 'supplementary_bam': None, 'no_clean': False, 'temp': '.', 'func': <function main_profile at 0x79718d914280>, 'software_name': 'tbprofiler', 'version': '6.2.1', 'tmp_prefix': 'ee6799dc-3f8e-4346-952c-f5e71e8cf5be', 'files_prefix': '/test/ee6799dc-3f8e-4346-952c-f5e71e8cf5be', 'conf': {'snpEff_db': 'Mycobacterium_tuberculosis_h37rv', 'drugs': ['rifampicin', 'isoniazid', 'ethambutol', 'pyrazinamide', 'moxifloxacin', 'levofloxacin', 'bedaquiline', 'delamanid', 'pretomanid', 'linezolid', 'streptomycin', 'amikacin', 'kanamycin', 'capreomycin', 'clofazimine', 'ethionamide', 'para-aminosalicylic_acid', 'cycloserine'], 'tb-profiler-version': '>=6.0.0,<7.0.0', 'version': {'name': 'tbdb', 'commit': 'e6a0040', 'Merge': 'e3c6ce6 539a394', 'Author': 'Jody Phelan <jody.phelan@lshtm.ac.uk>', 'Date': 'Thu Mar 21 14:45:00 2024 +0000'}, 'amplicon': False, 'chromosome_conversion': {'target': ['Chromosome'], 'source': ['NC_000962.3']}, 'files': {'ref': 'tbdb.fasta', 'gff': 'tbdb.gff', 'bed': 'tbdb.bed', 'json_db': 'tbdb.dr.json', 'variables': 'tbdb.variables.json', 'spoligotype_spacers': 'tbdb.spoligotype_spacers.txt', 'spoligotype_annotations': 'tbdb.spoligotype_list.csv', 'bedmask': 'tbdb.mask.bed', 'barcode': 'tbdb.barcode.bed', 'rules': 'tbdb.rules.txt'}, 'ref': '/test/tbdb/tbdb.fasta', 'gff': '/test/tbdb/tbdb.gff', 'bed': '/test/tbdb/tbdb.bed', 'variables': {'snpEff_db': 'Mycobacterium_tuberculosis_h37rv', 'drugs': ['rifampicin', 'isoniazid', 'ethambutol', 'pyrazinamide', 'moxifloxacin', 'levofloxacin', 'bedaquiline', 'delamanid', 'pretomanid', 'linezolid', 'streptomycin', 'amikacin', 'kanamycin', 'capreomycin', 'clofazimine', 'ethionamide', 'para-aminosalicylic_acid', 'cycloserine'], 'tb-profiler-version': '>=6.0.0,<7.0.0', 'version': {'name': 'tbdb', 'commit': 'e6a0040', 'Merge': 'e3c6ce6 539a394', 'Author': 'Jody Phelan <jody.phelan@lshtm.ac.uk>', 'Date': 'Thu Mar 21 14:45:00 2024 +0000'}, 'amplicon': False, 'chromosome_conversion': {'target': ['Chromosome'], 'source': ['NC_000962.3']}, 'files': {'ref': 'tbdb.fasta', 'gff': 'tbdb.gff', 'bed': 'tbdb.bed', 'json_db': 'tbdb.dr.json', 'variables': 'tbdb.variables.json', 'spoligotype_spacers': 'tbdb.spoligotype_spacers.txt', 'spoligotype_annotations': 'tbdb.spoligotype_list.csv', 'bedmask': 'tbdb.mask.bed', 'barcode': 'tbdb.barcode.bed', 'rules': 'tbdb.rules.txt'}}, 'spoligotype_spacers': '/test/tbdb/tbdb.spoligotype_spacers.txt', 'spoligotype_annotations': '/test/tbdb/tbdb.spoligotype_list.csv', 'bedmask': '/test/tbdb/tbdb.mask.bed', 'barcode': '/test/tbdb/tbdb.barcode.bed', 'rules': ['Variant(gene_name=mmpL5,type=lof) inactivates_resistance Variant(gene_name=mmpR5)', 'Variant(gene_name=eis,type=lof) inactivates_resistance Variant(gene_name=eis)'], 'variant_filters': {'depth_hard': 0, 'depth_soft': 10, 'af_hard': 0, 'af_soft': 0.1, 'strand_hard': 0, 'strand_soft': 3, 'sv_depth_hard': 0, 'sv_depth_soft': 10, 'sv_af_hard': 0.5, 'sv_af_soft': 0.9, 'sv_len_hard': 100000, 'sv_len_soft': 50000}}, 'run_delly': True, 'samclip': True, 'coverage_qc': True, 'data_source': 'bam', 'call_lineage': True}
\```
## Traceback:
\```
  File "/opt/conda/envs/test/bin/tb-profiler", line 587, in <module>
    args.func(args)
  File "/opt/conda/envs/test/bin/tb-profiler", line 101, in main_profile
    variants_profile = pp.run_profiler(args)
  File "/opt/conda/envs/test/lib/python3.10/site-packages/pathogenprofiler/cli.py", line 192, in run_profiler
    get_vcf_file(args)
  File "/opt/conda/envs/test/lib/python3.10/site-packages/pathogenprofiler/cli.py", line 174, in get_vcf_file
    args.vcf = get_vcf_from_bam(args)
  File "/opt/conda/envs/test/lib/python3.10/site-packages/pathogenprofiler/cli.py", line 157, in get_vcf_from_bam
    run_cmd("bcftools concat %s %s | bcftools sort -Oz -o %s" % (vcf_obj.filename,delly_vcf_obj.filename,final_target_vcf_file))
  File "/opt/conda/envs/test/lib/python3.10/site-packages/pathogenprofiler/utils.py", line 486, in run_cmd
    raise ValueError("Command Failed:\n%s\nstderr:\n%s" % (cmd,result.stderr.decode()))
\```
## Value:
\```
Command Failed:
/bin/bash -c set -o pipefail; bcftools concat /test/ee6799dc-3f8e-4346-952c-f5e71e8cf5be.short_variants.targets.vcf.gz /test/ee6799dc-3f8e-4346-952c-f5e71e8cf5be.delly.targets.vcf.gz | bcftools sort -Oz -o /test/ee6799dc-3f8e-4346-952c-f5e71e8cf5be.targets.vcf.gz
stderr:
Checking the headers and starting positions of 2 files
[W::bcf_hdr_merge] Trying to combine "GQ" tag definitions of different types
Different sample names in /test/ee6799dc-3f8e-4346-952c-f5e71e8cf5be.delly.targets.vcf.gz. Perhaps "bcftools merge" is what you are looking for?
Writing to /tmp/bcftools.A2ev5A
[E::bcf_hdr_read] Input is not detected as bcf or vcf format
Could not read VCF/BCF headers from -
Cleaning
\```
jodyphelan commented 1 week ago

Hi @schorlton-bugseq

Sorry for the delay in getting back to you on this. You can use --no_clean to stop removal of th intermediate files. This might be a tricky one to debug as it could be a variety of issues that cause failure to make the vcf. Feel free to drop me an email (my email is in the database version dictionary), if you get stuck and I'd be happy to help.

schorlton-bugseq commented 2 days ago

Thanks. The --no_clean helped. I think I finally got this, with two issues:

Thanks again for your consideration!

jodyphelan commented 2 days ago

Thanks for debugging this! Definitely would be useful to add some sanity checks. Will get those in into the next version.

schorlton-bugseq commented 8 hours ago

New challenge with this:

v5 example: (needed to gz it for GitHub so please uncompress): no_rg.bam.gz

docker run --rm -it quay.io/biocontainers/tb-profiler:5.0.1--pyhdfd78af_1

# finds variant
tb-profiler profile --bam no_rg.bam -p no_rg

# add read group
samtools addreplacerg -u -o rg.bam -r 'ID:rg SM:rg' no_rg.bam

# misses variant
tb-profiler profile --bam rg.bam -p rg

Thoughts on what could be causing this?

Thanks again for your help!

jodyphelan commented 7 hours ago

The bam header is a pit picky with whitespace. Try adding the RG with:

samtools addreplacerg -u -o rg.bam -r 'ID:rg\tSM:rg' no_rg.bam
schorlton-bugseq commented 4 hours ago

The bam header is a pit picky with whitespace. Try adding the RG with:

samtools addreplacerg -u -o rg.bam -r 'ID:rg\tSM:rg' no_rg.bam

Brilliant - that worked - thanks!

PS feel free to close this issue if appropriate as I see there is already a commit mentioning it.

jodyphelan commented 4 hours ago

Great thanks for helping debug!