shandley / hecatomb

hecatomb is a virome analysis pipeline for analysis of Illumina sequence data
MIT License
54 stars 12 forks source link

add-host is failing on mask_host #98

Open beardymcjohnface opened 1 year ago

beardymcjohnface commented 1 year ago

I managed to replicate the issue:

Error in rule mask_host:
    jobid: 1
    input: GCF_010909765.2_sAmbRad1.1.pri_genomic.ndrop.fna, hecatomb.out/processing/temp/yeet.sam.gz
    output: hecatomb.out/processing/temp/yeet.processed.fasta, /home/mike/miniconda3/envs/hecatomb/lib/python3.10/site-packages/hecatomb/snakemake/workflow/../databases/host/yeet/masked_ref.fa.gz
    log: hecatomb.out/stderr/mask_host.log (check log file(s) for error details)
    conda-env: /home/mike/miniconda3/envs/hecatomb/lib/python3.10/site-packages/hecatomb/snakemake/conda/c9790800272e0e3eb4a18b1640a733bc_
    shell:

        bbmask.sh in=GCF_010909765.2_sAmbRad1.1.pri_genomic.ndrop.fna out=hecatomb.out/processing/temp/yeet.processed.fasta             entropy= sam=hecatomb.out/processing/temp/yeet.sam.gz ow=t                                                threads=8 -Xmx32000m &> hecatomb.out/stderr/mask_host.log
        gzip -c hecatomb.out/processing/temp/yeet.processed.fasta > /home/mike/miniconda3/envs/hecatomb/lib/python3.10/site-packages/hecatomb/snakemake/workflow/../databases/host/yeet/masked_ref.fa.                                   gz

        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)
Logfile hecatomb.out/stderr/mask_host.log:
================================================================================
java -ea -Xmx32000m -Xms32000m -cp /home/mike/miniconda3/envs/hecatomb/lib/python3.10/site-packages/hecatomb/snakemake/conda/c9790800272e0e3eb4a18b1640a733bc_/opt/bbmap-38.90-3/current/ jgi.BBMask i                                   n=GCF_010909765.2_sAmbRad1.1.pri_genomic.ndrop.fna out=hecatomb.out/processing/temp/yeet.processed.fasta entropy= sam=hecatomb.out/processing/temp/yeet.sam.gz ow=t threads=8 -Xmx32000m
Executing jgi.BBMask [in=GCF_010909765.2_sAmbRad1.1.pri_genomic.ndrop.fna, out=hecatomb.out/processing/temp/yeet.processed.fasta, entropy=, sam=hecatomb.out/processing/temp/yeet.sam.gz, ow=t, thread                                   s=8, -Xmx32000m]

Exception in thread "main" java.lang.NullPointerException: Cannot invoke "String.trim()" because "in" is null
        at java.base/jdk.internal.math.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:1838)
        at java.base/jdk.internal.math.FloatingDecimal.parseFloat(FloatingDecimal.java:122)
        at java.base/java.lang.Float.parseFloat(Float.java:556)
        at jgi.BBMask.<init>(BBMask.java:160)
        at jgi.BBMask.main(BBMask.java:54)
================================================================================
pengouy commented 10 months ago

I managed to replicate the issue:

Error in rule mask_host:
    jobid: 1
    input: GCF_010909765.2_sAmbRad1.1.pri_genomic.ndrop.fna, hecatomb.out/processing/temp/yeet.sam.gz
    output: hecatomb.out/processing/temp/yeet.processed.fasta, /home/mike/miniconda3/envs/hecatomb/lib/python3.10/site-packages/hecatomb/snakemake/workflow/../databases/host/yeet/masked_ref.fa.gz
    log: hecatomb.out/stderr/mask_host.log (check log file(s) for error details)
    conda-env: /home/mike/miniconda3/envs/hecatomb/lib/python3.10/site-packages/hecatomb/snakemake/conda/c9790800272e0e3eb4a18b1640a733bc_
    shell:

        bbmask.sh in=GCF_010909765.2_sAmbRad1.1.pri_genomic.ndrop.fna out=hecatomb.out/processing/temp/yeet.processed.fasta             entropy= sam=hecatomb.out/processing/temp/yeet.sam.gz ow=t                                                threads=8 -Xmx32000m &> hecatomb.out/stderr/mask_host.log
        gzip -c hecatomb.out/processing/temp/yeet.processed.fasta > /home/mike/miniconda3/envs/hecatomb/lib/python3.10/site-packages/hecatomb/snakemake/workflow/../databases/host/yeet/masked_ref.fa.                                   gz

        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)
Logfile hecatomb.out/stderr/mask_host.log:
================================================================================
java -ea -Xmx32000m -Xms32000m -cp /home/mike/miniconda3/envs/hecatomb/lib/python3.10/site-packages/hecatomb/snakemake/conda/c9790800272e0e3eb4a18b1640a733bc_/opt/bbmap-38.90-3/current/ jgi.BBMask i                                   n=GCF_010909765.2_sAmbRad1.1.pri_genomic.ndrop.fna out=hecatomb.out/processing/temp/yeet.processed.fasta entropy= sam=hecatomb.out/processing/temp/yeet.sam.gz ow=t threads=8 -Xmx32000m
Executing jgi.BBMask [in=GCF_010909765.2_sAmbRad1.1.pri_genomic.ndrop.fna, out=hecatomb.out/processing/temp/yeet.processed.fasta, entropy=, sam=hecatomb.out/processing/temp/yeet.sam.gz, ow=t, thread                                   s=8, -Xmx32000m]

Exception in thread "main" java.lang.NullPointerException: Cannot invoke "String.trim()" because "in" is null
        at java.base/jdk.internal.math.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:1838)
        at java.base/jdk.internal.math.FloatingDecimal.parseFloat(FloatingDecimal.java:122)
        at java.base/java.lang.Float.parseFloat(Float.java:556)
        at jgi.BBMask.<init>(BBMask.java:160)
        at jgi.BBMask.main(BBMask.java:54)
================================================================================

Hello, I have also encountered the same error and fail to solve this problem. It would be very helpful to let me know if you have any sollutions, thanks a lot!

pengouy commented 10 months ago

[2023:11:16 13:41:28] Config file hecatomb.out/hecatomb.config.yaml already exists. Using existing config file. [2023:11:16 13:41:28] Updating config file with new values [2023:11:16 13:41:28] Writing config file to hecatomb.out/hecatomb.config.yaml [2023:11:16 13:41:28] ------------------ [2023:11:16 13:41:28] | Runtime config | [2023:11:16 13:41:28] ------------------

args: assembly: cross databases: null host: human hostFa: db/gga.fa hostName: gga library: paired log: hecatomb.out/hecatomb.log output: hecatomb.out reads: /public3/home/sc/anaconda3/envs/hecatomb/lib/python3.10/site-packages/hecatomb/test_data search: sensitive assembly: canu: correctedErrorRate=0.16 maxInputCoverage=10000 minInputCoverage=0 corOutCoverage=10000 corMhapSensitivity=high corMinCoverage=0 useGrid=False stopOnLowCoverage=False genomeSize=10M -nanopore flye: -g 1g megahit: --k-min 45 --k-max 225 --k-step 26 --min-count 2 --min-contig-len 1000 metaflye: --meta -g 1g --nano-raw mmseqs: filtAAprimary: --min-length 30 -e 1e-3 filtAAsecondary: --min-length 30 -e 1e-5 filtNTprimary: --min-length 90 -e 1e-10 filtNTsecondary: --min-length 90 -e 1e-20 linclustParams: --kmer-per-seq-scale 0.3 -c 0.8 --cov-mode 1 --min-seq-id 0.97 --alignment-mode 3 perfAA: --start-sens 1 --sens-steps 3 -s 7 --lca-mode 2 --shuffle 0 perfAAfast: -s 4.0 --lca-mode 2 --shuffle 0 perfNT: --start-sens 2 -s 7 --sens-steps 3 perfNTfast: -s 4.0 taxIdIgnore: 0 1 2 10239 131567 12429 2759 qc: compression: 1 fastp: --qualified_quality_phred 15 --length_required 90 --cut_tail --cut_tail_window_size 25 --cut_tail_mean_quality 15 --dedup --dup_calc_accuracy 4 --trim_poly_x resources: big: cpu: 24 mem: 64000 time: 1440 med: cpu: 16 mem: 32000 time: 60 ram: cpu: 2 mem: 16000 sml: time: 10

[2023:11:16 13:41:28] --------------------- [2023:11:16 13:41:28] | Snakemake command | [2023:11:16 13:41:28] ---------------------

snakemake -s /public3/home/sc/anaconda3/envs/hecatomb/lib/python3.10/site-packages/hecatomb/snakemake/workflow/AddHost.smk --configfile hecatomb.out/hecatomb.config.yaml --jobs 8 --use-conda --conda-prefix /public3/home/sc/anaconda3/envs/hecatomb/lib/python3.10/site-packages/hecatomb/snakemake/conda --rerun-incomplete --printshellcmds --nolock --show-failed-logs Config file /public3/home/sc/anaconda3/envs/hecatomb/lib/python3.10/site-packages/hecatomb/snakemake/workflow/../config/config.yaml is extended by additional config specified via the command line. Config file /public3/home/sc/anaconda3/envs/hecatomb/lib/python3.10/site-packages/hecatomb/snakemake/workflow/../config/dbFiles.yaml is extended by additional config specified via the command line. Config file /public3/home/sc/anaconda3/envs/hecatomb/lib/python3.10/site-packages/hecatomb/snakemake/workflow/../config/immutable.yaml is extended by additional config specified via the command line. Building DAG of jobs... Using shell: /usr/bin/bash Provided cores: 8 Rules claiming more threads will be scaled down. Job stats: job count


all 1 mask_host 1 total 2

Select jobs to execute...

[Thu Nov 16 13:41:32 2023] rule mask_host: input: db/gga.fa, hecatomb.out/processing/temp/gga.sam.gz output: hecatomb.out/processing/temp/gga.processed.fasta, /public3/home/sc/anaconda3/envs/hecatomb/lib/python3.10/site-packages/hecatomb/snakemake/workflow/../databases/host/gga/masked_ref.fa.gz log: hecatomb.out/stderr/mask_host.log jobid: 1 reason: Missing output files: /public3/home/sc/anaconda3/envs/hecatomb/lib/python3.10/site-packages/hecatomb/snakemake/workflow/../databases/host/gga/masked_ref.fa.gz threads: 8 resources: tmpdir=/tmp, mem_mb=32000, mem_mib=30518

    bbmask.sh in=db/gga.fa out=hecatomb.out/processing/temp/gga.processed.fasta             entropy= sam=hecatomb.out/processing/temp/gga.sam.gz ow=t             threads=8 -Xmx32000m &> hecatomb.out/stderr/mask_host.log
    gzip -c hecatomb.out/processing/temp/gga.processed.fasta > /public3/home/sc/anaconda3/envs/hecatomb/lib/python3.10/site-packages/hecatomb/snakemake/workflow/../databases/host/gga/masked_ref.fa.gz

Activating conda environment: ../../anaconda3/envs/hecatomb/lib/python3.10/site-packages/hecatomb/snakemake/conda/2529cd9812bb3d1ebf33334bfdd3a3f8_ [Thu Nov 16 13:41:33 2023] Error in rule mask_host: jobid: 1 input: db/gga.fa, hecatomb.out/processing/temp/gga.sam.gz output: hecatomb.out/processing/temp/gga.processed.fasta, /public3/home/sc/anaconda3/envs/hecatomb/lib/python3.10/site-packages/hecatomb/snakemake/workflow/../databases/host/gga/masked_ref.fa.gz log: hecatomb.out/stderr/maskhost.log (check log file(s) for error details) conda-env: /public3/home/sc/anaconda3/envs/hecatomb/lib/python3.10/site-packages/hecatomb/snakemake/conda/2529cd9812bb3d1ebf33334bfdd3a3f8 shell:

    bbmask.sh in=db/gga.fa out=hecatomb.out/processing/temp/gga.processed.fasta             entropy= sam=hecatomb.out/processing/temp/gga.sam.gz ow=t             threads=8 -Xmx32000m &> hecatomb.out/stderr/mask_host.log
    gzip -c hecatomb.out/processing/temp/gga.processed.fasta > /public3/home/sc/anaconda3/envs/hecatomb/lib/python3.10/site-packages/hecatomb/snakemake/workflow/../databases/host/gga/masked_ref.fa.gz

    (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

Logfile hecatomb.out/stderr/mask_host.log:

java -ea -Xmx32000m -Xms32000m -cp /public3/home/sc/anaconda3/envs/hecatomb/lib/python3.10/site-packages/hecatomb/snakemake/conda/2529cd9812bb3d1ebf33334bfdd3a3f8_/opt/bbmap-38.90-3/current/ jgi.BBMask in=db/gga.fa out=hecatomb.out/processing/temp/gga.processed.fasta entropy= sam=hecatomb.out/processing/temp/gga.sam.gz ow=t threads=8 -Xmx32000m Executing jgi.BBMask [in=db/gga.fa, out=hecatomb.out/processing/temp/gga.processed.fasta, entropy=, sam=hecatomb.out/processing/temp/gga.sam.gz, ow=t, threads=8, -Xmx32000m]

Exception in thread "main" java.lang.NullPointerException at java.base/jdk.internal.math.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:1838) at java.base/jdk.internal.math.FloatingDecimal.parseFloat(FloatingDecimal.java:122) at java.base/java.lang.Float.parseFloat(Float.java:455) at jgi.BBMask.(BBMask.java:160) at jgi.BBMask.main(BBMask.java:54)

Shutting down, this might take some time. Exiting because a job execution failed. Look above for error message Complete log: .snakemake/log/2023-11-16T134129.211619.snakemake.log [2023:11:16 13:41:33] ERROR: Snakemake failed

@shandley Dear developer, I failed to add host ref in hecatomb, could you please help me with this question?

beardymcjohnface commented 10 months ago

I've reworked these rules and it seems to be working. I've dropped BBTools for minimap and BEDtools, and just using the normal refseq viral db rather than the shredded version. You can test it like so:

git clone https://github.com/shandley/hecatomb.git
cd hecatomb
git checkout dev
conda create -n hecatombDev python=3.11
conda activate hecatombDev
pip install -e .
cd ..
hecatomb add-host --host yeet --host-fa GCA_000001405.29_GRCh38.p14_genomic.fna --threads 8
hecatomb list-hosts