jodyphelan / TBProfiler

Profiling tool for Mycobacterium tuberculosis to detect ressistance and strain type from WGS data
GNU General Public License v3.0
104 stars 43 forks source link

`tb-profiler update_tbdb --match_ref` fails #337

Closed schorlton-bugseq closed 4 months ago

schorlton-bugseq commented 6 months ago

On v6.2.0, I run: tb-profiler update_tbdb --match_ref myref.fna

I get:

ValueError: Command Failed:
/bin/bash -c set -o pipefail; tb-profiler create_db --prefix tbdb --csv mutations.csv --watchlist watchlist.csv --rules rules.txt --match_ref /test/tbdb/myref.fna --load

...

File "pysam/libcfaidx.pyx", line 121, in pysam.libcfaidx.FastaFile.__cinit__
  File "pysam/libcfaidx.pyx", line 153, in pysam.libcfaidx.FastaFile._open
OSError: file `/test/tbdb/myref.fna` not found

It looks like it's looking for the ref in tbdb dir and not the parent where it is located.

Thanks for your help and tool!

jodyphelan commented 5 months ago

Hi @schorlton-bugseq

Ah I think you found a bug there. Try add the full path to your reference file and it should work. I'll patch this in the next release.

WhalleyT commented 5 months ago

Hi! When I supply the full path it does fix that particular error but I get an error downstream. The error is a keyError where the process is trying to look for the header of my reference and it is missing e.g. for a fasta:

>test
ATGGC

gives the error:

Traceback (most recent call last):
  File "/home/tom/micromamba/bin/tb-profiler", line 583, in <module>
    args.func(args)
  File "/home/tom/micromamba/bin/tb-profiler", line 242, in main_create_db
    pp.create_db(args,extra_files=extra_files)
  File "/home/tom/micromamba/lib/python3.10/site-packages/pathogenprofiler/db.py", line 505, in create_db
    write_bed(
  File "/home/tom/micromamba/lib/python3.10/site-packages/pathogenprofiler/db.py", line 120, in write_bed
    if genome_end > chrom_lengths[gene_info[gene].chrom]:
KeyError: 'test'

This can be rectified by renaming the header to match the original tbprofiler reference (>chromosome).

Thank you!

jodyphelan commented 5 months ago

Just checking - are using this refrence genome: https://www.ncbi.nlm.nih.gov/nuccore/NC_000962.3?

WhalleyT commented 5 months ago

Yes it is, it is also the same number of BP as the original TBProfiler reference

vrennie commented 4 months ago

Hi @jodyphelan I get the same KeyError when trying to use the --match_ref flag

jodyphelan commented 4 months ago

Ok it looks like the issue arises when tb-profiler update_tbdb is run first without --match_ref and then with. Try removing the tbdb directory is downloaded and then run your tb-profiler update_tbdb --match_ref /path/to/ref.fa and see if that works.

vrennie commented 4 months ago

HI @jodyphelan I gave this a try but I run against the same error

Thanks

WhalleyT commented 4 months ago

It still caused a (different error) when I ran it, but I was able to get it work.

if my reference is in ~/reference.fa I ran tb-profiler update_tbdb --match_ref reference.fa --commit <tbdb_commit> which then goes on to create ~/tbdb. I get an OSError: FileNotFound and mv reference.fa tbdb and run it again and it seems to work. If I remember this workaround didn't work when I tried it previously.

jodyphelan commented 4 months ago

Oh yeah it requires the full path to the reference file in the release version but this is fixed in https://github.com/jodyphelan/TBProfiler/commit/1e4c872be8376dc1746a89ca7dc24eb09e4cccd6

WhalleyT commented 4 months ago

Sorry yes you're right, I tried that originally and forgot when I tried it with the new update. It seems to be working now. Thanks :smile:

jodyphelan commented 4 months ago

Great! will close this now but if there are any more related issues feel free to reopen