ncbi / fcs

Foreign Contamination Screening caller scripts and documentation
Other
106 stars 14 forks source link

Action report AssertionError #98

Closed mneitzey closed 1 month ago

mneitzey commented 1 month ago

Hello,

I'm screening an annelid genome with the FCS-GX database. The test data was successful, and the screening appears to work with the annelid data until generating the action report (the .taxonomy.rpt is populated but not .fcs_gx_report.txt). The only changes between the test script and annelid script are the input fasta and taxid. Here is the debugged error log. I would appreciate any suggestions. I didn't see this specific issue noted elsewhere.

Script:

python3 /core/labs/Oneill/mneitzey/Software/fcs.py \
        --image /core/labs/Oneill/mneitzey/Software/fcs-gx-0-5-4.sif \
        screen genome \
        --fasta ../../07c_adapter-trim/fcs-adaptor/clean.fasta \
        --out-dir ./gx_out/ \
        --gx-db /isg/shared/databases/FCS-GX/2.0.0 \
        --tax-id 53621 \
        --debug

Error:

####### ['/app/bin/action_report', '--in=/output-volume//clean.53621.taxonomy.rpt']
Traceback (most recent call last):
  File "/tmp/Bazel.runfiles_nso3v5ys/runfiles/gdh_datasets/apps/fcs_genome/public/action_report.py", line 1862, in <module>
    sys.exit(main())
             ^^^^^^
  File "/tmp/Bazel.runfiles_nso3v5ys/runfiles/gdh_datasets/apps/fcs_genome/public/action_report.py", line 1857, in main
    return action_report(args)
           ^^^^^^^^^^^^^^^^^^^
  File "/tmp/Bazel.runfiles_nso3v5ys/runfiles/gdh_datasets/apps/fcs_genome/public/action_report.py", line 1677, in action_report
    r = Record.from_row(line.rstrip().split("\t"))
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/Bazel.runfiles_nso3v5ys/runfiles/gdh_datasets/apps/fcs_genome/public/action_report.py", line 307, in from_row
    row_range     = Record.id_to_range(rec.seq_id)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/Bazel.runfiles_nso3v5ys/runfiles/gdh_datasets/apps/fcs_genome/public/action_report.py", line 256, in id_to_range
    assert tilda_count == 3 # LR994621.1~20258968..20296541~~33292..35019
           ^^^^^^^^^^^^^^^^
AssertionError

-----------------------------------------------------------------------------

Traceback (most recent call last):
  File "/tmp/Bazel.runfiles_clj2f7ib/runfiles/gdh_datasets/apps/fcs_genome/public/run_gx.py", line 1091, in <module>
    main()
  File "/tmp/Bazel.runfiles_clj2f7ib/runfiles/gdh_datasets/apps/fcs_genome/public/run_gx.py", line 1067, in main
    run_classify_taxonomy_and_action_report(args)
  File "/tmp/Bazel.runfiles_clj2f7ib/runfiles/gdh_datasets/apps/fcs_genome/public/run_gx.py", line 734, in run_classify_taxonomy_and_action_report
    run(
  File "/tmp/Bazel.runfiles_clj2f7ib/runfiles/gdh_datasets/apps/fcs_genome/public/run_gx.py", line 719, in run
    subprocess.run(cmd, stdout=out_file, check=True, stderr=sys.stderr)
  File "/usr/lib/python3.11/subprocess.py", line 571, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['/app/bin/action_report', '--in=/output-volume//clean.53621.taxonomy.rpt']' returned non-zero exit status 1.
Traceback (most recent call last):
  File "/core/labs/Oneill/mneitzey/Software/fcs.py", line 476, in <module>
    sys.exit(main())
  File "/core/labs/Oneill/mneitzey/Software/fcs.py", line 465, in main
    gx.run()
  File "/core/labs/Oneill/mneitzey/Software/fcs.py", line 347, in run
    self.args.func(self)
  File "/core/labs/Oneill/mneitzey/Software/fcs.py", line 325, in run_screen_mode
    self.run_gx()
  File "/core/labs/Oneill/mneitzey/Software/fcs.py", line 243, in run_gx
    self.safe_exec(docker_args)
  File "/core/labs/Oneill/mneitzey/Software/fcs.py", line 168, in safe_exec
    subprocess.run(args, shell=False, check=True, text=True, stdout=sys.stdout, stderr=sys.stderr)
  File "/usr/lib/python3.10/subprocess.py", line 526, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['singularity', 'exec', '--bind', '/isg/shared/databases/FCS-GX:/app/db/gxdb/', '--bind', '/core/projects/EBP/conservation/tubeworms/genome_assembly/rifPach/reference/07c_adapter-trim/fcs-adaptor:/sample-volume/', '--bind', '/core/projects/EBP/conservation/tubeworms/genome_assembly/rifPach/reference/09_decontam/fcs-gx/gx_out:/output-volume/', '/core/labs/Oneill/mneitzey/Software/fcs-gx-0-5-4.sif', 'python3', '/app/bin/run_gx', '--fasta', '/sample-volume/clean.fasta', '--out-dir', '/output-volume/', '--gx-db', '/app/db/gxdb/2.0.0', '--tax-id', '53621', '--debug']' returned non-zero exit status 1.

Best, Michelle

etvedte commented 1 month ago

Hi Michelle,

I have a guess as to what's going on. Looks like you are running FCS-GX fcs.py screen genome on cleaned output from FCS-adaptor. The output of cleaning can have tilde ~ characters, which screen genome might be producing errors on. Can you retry running by following one of the options below:

  1. Run on your original FASTA (so long as it doesn't have ~ in sequence headers)
  2. Do some sort of sequence renaming on your cleaned FASTA to remove the ~ and trailing coordinates

Eric

mneitzey commented 1 month ago

Renaming headers worked! Thank you. Ran sed 's/~.*//' clean.fasta > clean_headers.fasta to fix in case anyone else runs into the same problem

etvedte commented 1 month ago

We will have this sorted in the next release, but thanks for posting your solution.