nextgenusfs / funannotate

Eukaryotic Genome Annotation Pipeline
http://funannotate.readthedocs.io
BSD 2-Clause "Simplified" License
317 stars 83 forks source link

snakemake+singularity workflow problem #767

Open Xueliang24 opened 2 years ago

Xueliang24 commented 2 years ago

Are you using the latest release? I am using the singularity lastest sif of funannotate

Describe the bug I want to use the snakemake+singularity workflow to predict fungal gene informations. So I wrote a script to run it. The script contained clean, sort, masked and predict. I have run it step by step to tested every step. As long as I run the part of predict (or including predict), script running always got stuck at Building DAG of jobs... . The script test.py is following:

 SAMPLES, = glob_wildcards("data/{sample}.fasta")

rule all:
     input:
         expand("result/{sample}/{sample}_predict.log", sample=SAMPLES)

rule clean:
   input:
         "data/{sample}.fasta"
   output:
         "result/{sample}/{sample}_clean.fasta"
   log:
         "result/{sample}/{sample}_clean.log"
     singularity:
         "/data/hanzg/04.funannotate/funannotate.sif"
     shell:
         """
         funannotate clean -i {input} -o {output} >{log} 2>&1
         """
rule sort:
   input:
       "result/{sample}/{sample}_clean.fasta"
   output:
        "result/{sample}/{sample}_sort.fasta"
   log:
         "result/{sample}/{sample}_sort.log"
   singularity:
        "/data/hanzg/04.funannotate/funannotate.sif"
   shell:
        """
        funannotate sort -i {input} -o {output} >{log} 2>&1
        """

rule mask:
    input:
         "result/{sample}/{sample}_sort.fasta"
    output:
         "result/{sample}/{sample}_masked.fasta"
    log:
         "result/{sample}/{sample}_masked.log"
    singularity:
        "/data/hanzg/04.funannotate/funannotate.sif"
    shell:
        """
        funannotate mask -i {input} -o {output} >{log} 2>&1
        """
rule predict:
    input:
         "result/{sample}/{sample}_masked.fasta"
    output:
         directory("result/{sample}/")
    log:
         "result/{sample}/{sample}_predict.log"
    singularity:
        "/data/hanzg/04.funannotate/funannotate.sif"
    shell:
        """
        funannotate predict -i {input} -o {output} -s {sample} --name {sample} --optimize_augustus --cpus 20 > {log} 2>&1
        """

What command did you issue? snakemake -s ./test.py --use-singularity --singularity-args " --bind /data/hanzg:/data/hanzg " --cores 20

Logfiles no log files because script running always got stuck at Building DAG of jobs... .

OS/Install Information

Checking dependencies for 1.8.12

You are running Python v 3.8.13. Now checking python packages... biopython: 1.79 goatools: 1.2.3 matplotlib: 3.5.2 natsort: 8.1.0 numpy: 1.22.4 pandas: 1.4.3 psutil: 5.9.1 requests: 2.28.1 scikit-learn: 1.1.1 scipy: 1.5.3 seaborn: 0.11.2 All 11 python packages installed

You are running Perl v b'5.026002'. Now checking perl modules... Carp: 1.38 Clone: 0.42 DBD::SQLite: 1.64 DBD::mysql: 4.046 DBI: 1.642 DB_File: 1.855 Data::Dumper: 2.173 File::Basename: 2.85 File::Which: 1.23 Getopt::Long: 2.5 Hash::Merge: 0.300 JSON: 4.02 LWP::UserAgent: 6.39 Logger::Simple: 2.0 POSIX: 1.76 Parallel::ForkManager: 2.02 Pod::Usage: 1.69 Scalar::Util::Numeric: 0.40 Storable: 3.15 Text::Soundex: 3.05 Thread::Queue: 3.12 Tie::File: 1.02 URI::Escape: 3.31 YAML: 1.29 local::lib: 2.000024 threads: 2.15 threads::shared: 1.56 All 27 Perl modules installed

Checking Environmental Variables... $FUNANNOTATE_DB=/opt/databases $PASAHOME=/venv/opt/pasa-2.4.1 $TRINITYHOME=/venv/opt/trinity-2.8.5 $EVM_HOME=/venv/opt/evidencemodeler-1.1.1 $AUGUSTUS_CONFIG_PATH=/usr/share/augustus/config $GENEMARK_PATH=/venv/opt/gmes_petap All 6 environmental variables are set

Checking external dependencies... Traceback (most recent call last): File "/venv/bin/emapper.py", line 694, in args = parse_args(parser) File "/venv/bin/emapper.py", line 509, in parse_args set_data_path(os.environ["EGGNOG_DATA_DIR"]) File "/venv/opt/eggnog-mapper/eggnogmapper/common.py", line 77, in set_data_path DATA_PATH = existing_dir(data_path) File "/venv/opt/eggnog-mapper/eggnogmapper/common.py", line 323, in existing_dir raise TypeError('not a valid directory "%s"' %dname) TypeError: not a valid directory "/opt/eggnog-mapper-data" PASA: 2.4.1 CodingQuarry: 2.0 Trinity: 2.8.5 augustus: 3.3.2 bamtools: bamtools 2.5.2 bedtools: bedtools v2.30.0 blat: BLAT v36 diamond: 2.0.15 ete3: 3.1.2 exonerate: exonerate 2.4.0 fasta: no way to determine glimmerhmm: 3.0.4 gmap: 2017-11-15 hisat2: 2.2.1 hmmscan: HMMER 3.3.2 (Nov 2020) hmmsearch: HMMER 3.3.2 (Nov 2020) java: 11.0.9.1-internal kallisto: 0.46.1 mafft: v7.505 (2022/Apr/10) makeblastdb: makeblastdb 2.2.31+ minimap2: 2.24-r1122 pigz: pigz 2.6 proteinortho: 6.0.16 pslCDnaFilter: no way to determine salmon: salmon 0.14.1 samtools: samtools 1.15 signalp: 5.0b snap: 2006-07-28 stringtie: 2.2.1 tRNAscan-SE: 2.0.9 (July 2021) tantan: tantan 39 tbl2asn: no way to determine, likely 25.X tblastn: tblastn 2.2.31+ trimal: trimAl v1.4.rev15 build[2013-12-17] trimmomatic: 0.39 ERROR: emapper.py not installed ERROR: gmes_petap.pl not installed

Xueliang24 commented 2 years ago

@reslp Could you also help me to solve it

hyphaltip commented 2 years ago

Sounds like eggnog mapper isn’t installed in this image? Also genemark but that shouldn’t be fatal.

reslp commented 2 years ago

Which funannotate image are you using @Xueliang24? The one I created does not include eggnog mapper or GeneMark so @hyphaltip is correct. However I don't think that this would cause snakemake to hang at building the DAG. I think this is a snakemake rather than a funannotate issue.

Xueliang24 commented 2 years ago

Which funannotate image are you using @Xueliang24? The one I created does not include eggnog mapper or GeneMark so @hyphaltip is correct. However I don't think that this would cause snakemake to hang at building the DAG. I think this is a snakemake rather than a funannotate issue.

I also think it is a snakemake problem, but I could not find the solved way.

Xueliang24 commented 2 years ago

Sounds like eggnog mapper isn’t installed in this image? Also genemark but that shouldn’t be fatal.

They don't affect the running of funannotate clean, sort, mask and predict.

spock commented 2 years ago

@Xueliang24 , maybe try adding some more specific output to your predict rule (because that output folder will also be created by other rules); that specific output you should then expand upon in the all rule. Right now you expect/expand on a log file, which is only listed as a log of the predict rule, and not the output:

rule predict:
...
    output:
         directory("result/{sample}/")
    log:
         "result/{sample}/{sample}_predict.log"
...

And I have a question: how did you build the singularity image? Did you use funannotate docker image, or built it yourself in some other way?

Xueliang24 commented 2 years ago

@Xueliang24 , maybe try adding some more specific output to your predict rule (because that output folder will also be created by other rules); that specific output you should then expand upon in the all rule. Right now you expect/expand on a log file, which is only listed as a log of the predict rule, and not the output:

rule predict:
...
    output:
         directory("result/{sample}/")
    log:
         "result/{sample}/{sample}_predict.log"
...

And I have a question: how did you build the singularity image? Did you use funannotate docker image, or built it yourself in some other way?

I pull the latest singularity image from the author. After adding the singlp and steup the database, I built it again. So it was based on the latest singularity.

About specific 'output', could you describe in more detail, or give an example? Because other rules focused on outputting files not folder, that folder is just a path.