nextgenusfs / funannotate

Eukaryotic Genome Annotation Pipeline
http://funannotate.readthedocs.io
BSD 2-Clause "Simplified" License
321 stars 86 forks source link

Issue during funannotate train step with PASA pipeline #836

Open AlWa1 opened 2 years ago

AlWa1 commented 2 years ago

Are you using the latest release? Yes, installed fun annotate via conda (v1.7.4)

Describe the bug While the test run (funannotate test -t all --cpus 12) ran without problem, I run into troubles performing the training step for my genome assembly. Tracking the bug back to the PASA log indicates a problem with ("Error, pasa binary [] isn't executable or couldn't be found.) accessing the PASA binary, although $PASAHOME has been set up properly and which pass refers to the right location. I tried to reinstall the newest release of the PASA pipeline manually but this did not help.

What command did you issue? funannotate train -i Lyc_v1.cleaned.sorted.masked.fa -o fun \ --single /rds/project/ss2123/rds-ss2123-team_seb_storage/data/RNAseq_raw_data/trimmed/L_rep1_trimmed.fq.gz /rds/project/ss2123/rds-ss2123-team_seb_storage/data/RNAseq_raw_data/trimmed/L_rep2_trimmed.fq.gz /rds/project/ss2123/rds-ss2123-team_seb_storage/data/RNAseq_raw_data/trimmed/L_rep3_trimmed.fq.gz /rds/project/ss2123/rds-ss2123-team_seb_storage/data/RNAseq_raw_data/trimmed/H_rep1_trimmed.fq.gz /rds/project/ss2123/rds-ss2123-team_seb_storage/data/RNAseq_raw_data/trimmed/H_rep2_trimmed.fq.gz /rds/project/ss2123/rds-ss2123-team_seb_storage/data/RNAseq_raw_data/trimmed/H_rep3_trimmed.fq.gz /rds/project/ss2123/rds-ss2123-team_seb_storage/data/RNAseq_raw_data/trimmed/trifolium_trimmed.fq.gz /rds/project/ss2123/rds-ss2123-team_seb_storage/data/RNAseq_raw_data/trimmed/plantago_trimmed.fq.gz \ --cpus 14 --jaccard_clip

Logfiles

General funannotate log file:

CMD ERROR: /rds/project/ss2123/rds-ss2123-team_seb_storage/software/PASApipeline-v2.5.2/Launch_PASA_pipeline.pl -c /rds/project/ss2123/rds-ss2123-team_seb_storage/projects/20221108_Lyc1trainandpredict2/fun/training/pasa/alignAssembly.txt -r -C -R -g /rds/project/ss2123/rds-ss2123-team_seb_storage/projects/20221108_Lyc1trainandpredict2/fun/training/genome.fasta --IMPORT_CUSTOM_ALIGNMENTS /rds/project/ss2123/rds-ss2123-team_seb_storage/projects/20221108_Lyc1trainandpredict2/fun/training/trinity.alignments.gff3 -T -t /rds/project/ss2123/rds-ss2123-team_seb_storage/projects/20221108_Lyc1trainandpredict2/fun/training/trinity.fasta.clean -u /rds/project/ss2123/rds-ss2123-team_seb_storage/projects/20221108_Lyc1trainandpredict2/fun/training/trinity.fasta --stringent_alignment_overlap 30.0 --TRANSDECODER --ALT_SPLICE --MAX_INTRON_LENGTH 3000 --CPU 14 --ALIGNERS blat --trans_gtf /rds/project/ss2123/rds-ss2123-team_seb_storage/projects/20221108_Lyc1trainandpredict2/fun/training/funannotate_train.stringtie.gtf

PASA log file:

Thread 8 terminated abnormally: Error, pasa binary [] isn't executable or couldn't be found. at /rds/project/ss2123/rds-ss2123-team_seb_storage/software/PASApipeline-v2.5.2/PerlLib/CDNA/PASA_alignment_assembler.pm line 66 thread 8. CDNA::PASA_alignment_assembler::_init(CDNA::PASA_alignment_assembler=HASH(0x2af7500e0f98)) called at /rds/project/ss2123/rds-ss2123-team_seb_storage/software/PASApipeline-v2.5.2/PerlLib/CDNA/PASA_alignment_assembler.pm line 52 thread 8 CDNA::PASA_alignment_assembler::new("CDNA::PASA_alignment_assembler") called at /rds/project/ss2123/rds-ss2123-team_seb_storage/software/PASApipeline-v2.5.2/scripts/assemble_clusters.dbi line 179 thread 8 main::assemble_transcripts_on_scaffold("scaffold_109") called at /rds/project/ss2123/rds-ss2123-team_seb_storage/software/PASApipeline-v2.5.2/scripts/assemble_clusters.dbi line 135 thread 8 eval {...} called at /rds/project/ss2123/rds-ss2123-team_seb_storage/software/PASApipeline-v2.5.2/scripts/assemble_clusters.dbi line 135 thread 8 Thread 5 terminated abnormally: Error, pasa binary [] isn't executable or couldn't be found. at /rds/project/ss2123/rds-ss2123-team_seb_storage/software/PASApipeline-v2.5.2/PerlLib/CDNA/PASA_alignment_assembler.pm line 66 thread 5. CDNA::PASA_alignment_assembler::_init(CDNA::PASA_alignment_assembler=HASH(0x2af7440ef450)) called at /rds/project/ss2123/rds-ss2123-team_seb_storage/software/PASApipeline-v2.5.2/PerlLib/CDNA/PASA_alignment_assembler.pm line 52 thread 5 CDNA::PASA_alignment_assembler::new("CDNA::PASA_alignment_assembler") called at /rds/project/ss2123/rds-ss2123-team_seb_storage/software/PASApipeline-v2.5.2/scripts/assemble_clusters.dbi line 179 thread 5 main::assemble_transcripts_on_scaffold("scaffold_106") called at /rds/project/ss2123/rds-ss2123-team_seb_storage/software/PASApipeline-v2.5.2/scripts/assemble_clusters.dbi line 135 thread 5 eval {...} called at /rds/project/ss2123/rds-ss2123-team_seb_storage/software/PASApipeline-v2.5.2/scripts/assemble_clusters.dbi line 135 thread 5 ERROR, thread 5 exited with error Error, pasa binary [] isn't executable or couldn't be found. at /rds/project/ss2123/rds-ss2123-team_seb_storage/software/PASApipeline-v2.5.2/PerlLib/CDNA/PASA_alignment_assembler.pm line 66 thread 5. CDNA::PASA_alignment_assembler::_init(CDNA::PASA_alignment_assembler=HASH(0x2af7440ef450)) called at /rds/project/ss2123/rds-ss2123-team_seb_storage/software/PASApipeline-v2.5.2/PerlLib/CDNA/PASA_alignment_assembler.pm line 52 thread 5 CDNA::PASA_alignment_assembler::new("CDNA::PASA_alignment_assembler") called at /rds/project/ss2123/rds-ss2123-team_seb_storage/software/PASApipeline-v2.5.2/scripts/assemble_clusters.dbi line 179 thread 5 main::assemble_transcripts_on_scaffold("scaffold_106") called at /rds/project/ss2123/rds-ss2123-team_seb_storage/software/PASApipeline-v2.5.2/scripts/assemble_clusters.dbi line 135 thread 5 eval {...} called at /rds/project/ss2123/rds-ss2123-team_seb_storage/software/PASApipeline-v2.5.2/scripts/assemble_clusters.dbi line 135 thread 5

ERROR, thread 8 exited with error Error, pasa binary [] isn't executable or couldn't be found. at /rds/project/ss2123/rds-ss2123-team_seb_storage/software/PASApipeline-v2.5.2/PerlLib/CDNA/PASA_alignment_assembler.pm line 66 thread 8. CDNA::PASA_alignment_assembler::_init(CDNA::PASA_alignment_assembler=HASH(0x2af7500e0f98)) called at /rds/project/ss2123/rds-ss2123-team_seb_storage/software/PASApipeline-v2.5.2/PerlLib/CDNA/PASA_alignment_assembler.pm line 52 thread 8 CDNA::PASA_alignment_assembler::new("CDNA::PASA_alignment_assembler") called at /rds/project/ss2123/rds-ss2123-team_seb_storage/software/PASApipeline-v2.5.2/scripts/assemble_clusters.dbi line 179 thread 8 main::assemble_transcripts_on_scaffold("scaffold_109") called at /rds/project/ss2123/rds-ss2123-team_seb_storage/software/PASApipeline-v2.5.2/scripts/assemble_clusters.dbi line 135 thread 8 eval {...} called at /rds/project/ss2123/rds-ss2123-team_seb_storage/software/PASApipeline-v2.5.2/scripts/assemble_clusters.dbi line 135 thread 8

OS/Install Information Checking dependencies for 1.8.13

You are running Python v 3.8.13. Now checking python packages... biopython: 1.79 goatools: 1.2.3 matplotlib: 3.4.3 natsort: 8.2.0 numpy: 1.23.3 pandas: 1.5.0 psutil: 5.9.2 requests: 2.28.1 scikit-learn: 1.1.2 scipy: 1.9.1 seaborn: 0.12.0 All 11 python packages installed

You are running Perl v b'5.032001'. Now checking perl modules... Carp: 1.50 Clone: 0.42 DBD::SQLite: 1.70 DBD::mysql: 4.046 DBI: 1.643 DB_File: 1.855 Data::Dumper: 2.183 File::Basename: 2.85 File::Which: 1.24 Getopt::Long: 2.52 Hash::Merge: 0.302 JSON: 4.09 LWP::UserAgent: 6.67 Logger::Simple: 2.0 POSIX: 1.94 Parallel::ForkManager: 2.02 Pod::Usage: 1.69 Scalar::Util::Numeric: 0.40 Storable: 3.15 Text::Soundex: 3.05 Thread::Queue: 3.13 Tie::File: 1.06 URI::Escape: 5.12 YAML: 1.30 local::lib: 2.000029 threads: 2.21 threads::shared: 1.61 All 27 Perl modules installed

Checking Environmental Variables... $FUNANNOTATE_DB=/rds/user/aw932/hpc-work/databases/funannotate_db $PASAHOME=/rds/user/aw932/hpc-work/software/miniconda3_V2/envs/funannotate/opt/pasa-2.5.2 $TRINITY_HOME=/rds/user/aw932/hpc-work/software/miniconda3_V2/envs/funannotate/opt/trinity-2.8.5 $EVM_HOME=/rds/user/aw932/hpc-work/software/miniconda3_V2/envs/funannotate/opt/evidencemodeler-1.1.1 $AUGUSTUS_CONFIG_PATH=/rds/user/aw932/hpc-work/software/miniconda3_V2/envs/funannotate/config/ $GENEMARK_PATH=/home/aw932/rds/hpc-work/software/gmes All 6 environmental variables are set

Checking external dependencies... PASA: 2.5.2 CodingQuarry: 2.0 Trinity: 2.8.5 augustus: 3.4.0 bamtools: bamtools 2.5.1 bedtools: bedtools v2.30.0 blat: BLAT v35 diamond: 2.0.15 ete3: 3.1.2 exonerate: exonerate 2.4.0 fasta: no way to determine glimmerhmm: 3.0.4 gmap: 2021-08-25 gmes_petap.pl: 4.69_lic hisat2: 2.2.1 hmmscan: HMMER 3.3.2 (Nov 2020) hmmsearch: HMMER 3.3.2 (Nov 2020) java: 17.0.3-internal kallisto: 0.46.1 mafft: v7.508 (2022/Sep/07) makeblastdb: makeblastdb 2.2.31+ minimap2: 2.24-r1122 proteinortho: 6.1.1 pslCDnaFilter: no way to determine salmon: salmon 0.14.1 samtools: samtools 1.15.1 signalp: 4.1 snap: 2006-07-28 stringtie: 2.2.1 tRNAscan-SE: 2.0.9 (July 2021) tantan: tantan 39 tbl2asn: no way to determine, likely 25.X tblastn: tblastn 2.2.31+ trimal: trimAl v1.4.rev15 build[2013-12-17] trimmomatic: [0.003s][warning][os,container] Duplicate cpuset controllers detected. Picking /sys/fs/cgroup/cpuset, skipping /cgroup-sl/cpuset. ERROR: emapper.py not installed ERROR: pigz not installed

hyphaltip commented 2 years ago

Can you find the pasa exe in the path? Is it executeable and compiled properly?

AlWa1 commented 2 years ago

Yes, running it from the path or simply typing pasa into the command line properly starts the software and gives following output:

Usage: pasa inputFile [opts]

options: 

-F fuzzlength (bp to discount at alignment termini during pairwise
   compatibility checks.  (default: 20)
-a illustrate incoming alignments only.  No assembly performed.
-v verbose

It seems to get the error message from the init step of the PASA_alignment_assembler.pm perl script (see below). I tried to exchange my $pasa_bin = `which pasa`; for my $pasa_bin = `/rds/project/ss2123/rds-ss2123-team_seb_storage/software/PASApipeline-v2.5.2/bin/pasa`; but this did not help either. Shouldn't the error message ( Error, pasa binary [] isn't executable or couldn't be found. ) actually contain the path to PASA in the brackets? I was wondering whether assignment of anything to that variable seems to fail. In a related post (which unfortunateley did not really mention the solution to this) somebody was talking about a software environment problem which might lead to this - but unfortunately I am rather a biologist than an informatician to really make sense of this.

sub _init {
    my $self = shift;
    $self->{incoming_alignments} = []; #these are the alignments to be assembled.
    $self->{assemblies} = []; #contains list of all singletons and assemblies.
    $self->{fuzzlength} = $FUZZLENGTH;  #default setting.

    my $pasa_bin = `which pasa`;
    $pasa_bin =~ s/\s//g;

    unless (-x $pasa_bin) {
        confess "Error, pasa binary [$pasa_bin] isn't executable or couldn't be found.";
    }

    $self->{pasa_bin} = $pasa_bin;
}

As a bit of background, I am running the script on a HPC cluster with RHEL7/SLURM CPU cluster nodes using following script.sh (see below). $PASAHOME and 'which pasa' will give me the right path in both my conda base and conda funnnotate environment.

#!/bin/bash -l

#SBATCH -J funannotate_trainandpredict
#SBATCH -D /rds/project/ss2123/rds-ss2123-team_seb_storage/projects/20221108_Lyc1trainandpredict2
#SBATCH -o 20221107_funannotate_Lyc1trainandpredict2.log
#SBATCH -A SL3-CPU
#SBATCH -c 14              # max 32 CPUs
#SBATCH -p skylake-himem 
#SBATCH -t 36:00:00            
#SBATCH --mem-per-cpu=12G

source activate funannotate

funannotate check --show-versions

funannotate clean -i /rds/project/ss2123/rds-ss2123-team_seb_storage/data/genomes/Lyc_v1/Lyc_spec.scaffolds.fa --minlen 1000 -o Lyc_v1.cleaned.fa
funannotate sort -i Lyc_v1.cleaned.fa -b scaffold -o Lyc_v1.cleaned.sorted.fa
funannotate mask -i Lyc_v1.cleaned.sorted.fa --cpus 14 -o Lyc_v1.cleaned.sorted.masked.fa

funannotate train -i Lyc_v1.cleaned.sorted.masked.fa -o fun \
    --single /rds/project/ss2123/rds-ss2123-team_seb_storage/data/RNAseq_raw_data/trimmed/L_rep1_trimmed.fq.gz /rds/project/ss2123/rds-ss2123-team_seb_storage/data/RNAseq_raw_data/trimmed/L_rep2_trimmed.fq.gz /rds/project/ss2123/rds-ss2123-team_seb_storage/data/RNAseq_raw_data/trimmed/L_rep3_trimmed.fq.gz /rds/project/ss2123/rds-ss2123-team_seb_storage/data/RNAseq_raw_data/trimmed/H_rep1_trimmed.fq.gz /rds/project/ss2123/rds-ss2123-team_seb_storage/data/RNAseq_raw_data/trimmed/H_rep2_trimmed.fq.gz /rds/project/ss2123/rds-ss2123-team_seb_storage/data/RNAseq_raw_data/trimmed/H_rep3_trimmed.fq.gz /rds/project/ss2123/rds-ss2123-team_seb_storage/data/RNAseq_raw_data/trimmed/trifolium_trimmed.fq.gz /rds/project/ss2123/rds-ss2123-team_seb_storage/data/RNAseq_raw_data/trimmed/plantago_trimmed.fq.gz \
    --cpus 14

conda deactivate
hyphaltip commented 2 years ago

does which pasa work in your submit script after you do source activate funannotate ? on our system we use unix modules and have to install either the pasa into the conda env or do the module load of the pasa afterwards eg https://github.com/ucr-hpcc/hpcc_modules/blob/main/funannotate/1.8

or is your pasa installed into the funannotate conda env? /rds/project/ss2123/rds-ss2123-team_seb_storage/software/PASApipeline-v2.5.2 looks like separate install - are you updating your PATH to have that in it?

The check versions is reporting: /rds/user/aw932/hpc-work/software/miniconda3_V2/envs/funannotate/opt/pasa-2.5.2 for PASAHOME while you are pointing to /rds/project/ss2123/rds-ss2123-team_seb_storage/software/PASApipeline-v2.5.2 in your message above?

AlWa1 commented 2 years ago

Sorry, for the confusion with the different paths mentioned in posts - those outputs originated from different approaches (with either using my conda env pasa or a seperately installed pasa - in each case changing my PATH and PASAHOME variables accordingly). In each of the contexts, which pasa gave me the right output (even when started after source activate funannotate). However, currently the funannotate test script doesn't work anymore, so I am trying to set it up from scratch (currently running into a completely different error getting "ERROR: pslDnaFiler found but error running: pslCDnaFilter: error while loading shared libraries: libssl.so.1.0.0: cannot open shared object file: No such file or directory?" after pip installation of funannotate in master).

hyphaltip commented 2 years ago

yep - commented on that bug - not much else I can recommend at this time - if I have development time in December I'll try to work through these problems. @nextgenusfs is working on funannotate v2 which maybe will remove some of these dependencies.