jodyphelan / TBProfiler

Profiling tool for Mycobacterium tuberculosis to detect ressistance and strain type from WGS data
GNU General Public License v3.0
102 stars 42 forks source link

File not found when running `tb-profiler update_tbdb --match_ref` #278

Closed aofarrel closed 1 year ago

aofarrel commented 1 year ago

Version: staphb/tbprofiler:4.4.2 docker image

I'm hoping to run tb_profiler on some bams, but ran into an error similar to this one. My fasta file starts with >NC_000962.3 so I figured I need to run tb-profiler update_tbdb --match_ref, but I can't seem to get that working. Seems like it cannot find the fasta file I'm passing into it?

> ls -lha
drwxr-xr-x  1 root root 4.0K Mar 28 22:03 ..
drwxr-xr-x 14 root root  448 Mar 28 02:24 .git
-rw-r--r--  1 root root   66 Mar 24 01:07 .gitattributes
-rw-r--r--  1 root root  332 Mar 28 21:56 Dockerfile
-rw-r--r--  1 root root  451 Mar 26 20:14 tb_profiler.wdl
-rw-r--r--  1 root root 4.3M Mar 28 22:04 tb_seq.fasta
-rw-r--r--  1 root root   12 Mar 28 22:11 tb_seq.fasta.amb
-rw-r--r--  1 root root   46 Mar 28 22:11 tb_seq.fasta.ann
-rw-r--r--  1 root root 4.3M Mar 28 22:11 tb_seq.fasta.bwt
-rw-r--r--  1 root root   29 Mar 28 22:11 tb_seq.fasta.fai
-rw-r--r--  1 root root 1.1M Mar 28 22:11 tb_seq.fasta.pac
-rw-r--r--  1 root root 2.2M Mar 28 22:11 tb_seq.fasta.sa

.git, .gitattributes, Dockerfile, and tb_profiler.wdl are my own files from my own repo.

> tb-profiler update_tbdb --match_ref tb_seq.fasta
Running command:
set -u pipefail; git clone https://github.com/jodyphelan/tbdb.git

Running command:
set -u pipefail; git checkout master

Running command:
set -u pipefail; git pull

Running command:
set -u pipefail; tb-profiler create_db --prefix tbdb --match_ref tb_seq.fasta --load
Traceback (most recent call last):
  File "/opt/conda/bin/tb-profiler", line 693, in <module>
    args.func(args)
  File "/opt/conda/bin/tb-profiler", line 212, in main_update_tbdb
    pp.run_cmd(f"tb-profiler create_db --prefix {args.prefix} {tmp} --load")
  File "/opt/conda/lib/python3.9/site-packages/pathogenprofiler/utils.py", line 404, in run_cmd
    raise ValueError("Command Failed:\n%s\nstderr:\n%s" % (cmd,stderr.decode()))
ValueError: Command Failed:
set -u pipefail; tb-profiler create_db --prefix tbdb --match_ref tb_seq.fasta --load
stderr:
Traceback (most recent call last):
  File "/opt/conda/bin/tb-profiler", line 693, in <module>
    args.func(args)
  File "/opt/conda/bin/tb-profiler", line 232, in main_create_db
    pp.create_db(args,extra_files=extra_files)
  File "/opt/conda/lib/python3.9/site-packages/pathogenprofiler/db.py", line 584, in create_db
    chrom_conversion = match_ref_chrom_names(args.match_ref,"genome.fasta")
  File "/opt/conda/lib/python3.9/site-packages/pathogenprofiler/db.py", line 549, in match_ref_chrom_names
    source_fa = fa2dict(source)
  File "/opt/conda/lib/python3.9/site-packages/pathogenprofiler/db.py", line 63, in fa2dict
    for l in open(filename):
FileNotFoundError: [Errno 2] No such file or directory: 'tb_seq.fasta'
Cleaning up after failed run
Error in atexit._run_exitfuncs:
Traceback (most recent call last):
  File "/opt/conda/bin/tb-profiler", line 33, in cleanup
    del args.conf['json_db']
AttributeError: 'Namespace' object has no attribute 'conf'

Cleaning up after failed run
Error in atexit._run_exitfuncs:
Traceback (most recent call last):
  File "/opt/conda/bin/tb-profiler", line 33, in cleanup
    del args.conf['json_db']
AttributeError: 'Namespace' object has no attribute 'conf'

The samtools and bwa commands mentioned here work, so this isn't a matter of the file not being mounted into the Docker image, but tb-profiler update_tbdb --match_ref is still not working. Different iterations like ./tb_seq.fasta or $(pwd)/tb_seq.fasta don't seem to be working. Am I missing a step? Does my fasta need to go somewhere specific?

jodyphelan commented 1 year ago

Hi Ash,

This is a bug in that version. It is fixed in the master branch, I just need to make a conda release. After running the command and seeing the error, it should have clones a git repo called tbdb. What you should do is to go into that directory and manually run the create_db step:

cd tbdb
tb-profiler create_db --prefix tbdb --match_ref tb_seq.fasta --load

Let me know if that works for you.

aofarrel commented 1 year ago

It seems to be working now! Thank you so much!

(If you got a notification earlier with an error message in samtools index, that was an issue on my part, sorry!)

jodyphelan commented 1 year ago

Great to hear it is working now!