nextgenusfs / funannotate

Eukaryotic Genome Annotation Pipeline
http://funannotate.readthedocs.io
BSD 2-Clause "Simplified" License
301 stars 82 forks source link

PASA failure when running as job on cluster (but will run okay interactively) #882

Open DaRinker opened 1 year ago

DaRinker commented 1 year ago

Are you using the latest release? yes. running docker as singularity

Describe the bug Trying to annotate a large number of genomes on local cluster. Using Singularity image with supporting .sh script The command funannotate train Regularly fails when running as a job on the cluster. The error is always:

[Mar 14 10:24 PM]: Running PASA alignment step using 24,150 transcripts
[Mar 15 07:09 AM]: CMD ERROR: /venv/opt/pasa-2.4.1/Launch_PASA_pipeline.pl -c /path/to/my/assembly/fasta/funannotate.out/training/pasa/alignAssembly.txt -r -C -R -g /path/to/my/assembly/fasta/funannotate.out/training/genome.fasta --IMPORT_CUSTOM_ALIGNMENTS /path/to/my/assembly/fasta/funannotate.out/training/trinity.alignments.gff3 -T -t /path/to/my/assembly/fasta/funannotate.out/training/trinity.fasta.clean -u /path/to/my/assembly/fasta/funannotate.out/training/trinity.fasta --stringent_alignment_overlap 30.0 --TRANSDECODER --ALT_SPLICE --MAX_INTRON_LENGTH 3000 --CPU 20 --ALIGNERS blat --trans_gtf /path/to/my/assembly/fasta/funannotate.out/training/funannotate_train.stringtie.gtf

HOWEVER, if I run the same command (below) interactively, it all completes as it should. The failures occur across all compute nodes. I'm wondering if other processes on the compute nodes can interfere with funannotate? I've wondering if there's some problem with the way I'm specifying the number of cores?--is there something about the PASA step specifically that might consistently be "calling me out" on this??

What command did you issue?

funannotate train -i ${genomefasta%.fa*}.funsorted.masked.fasta \
-o funannotate.out \
-l ${rnaseqreadspath}/${fwdrnaseq} \
-r ${rnaseqreadspath}/${revrnaseq} \
--cpus 20

Logfiles

$ cat funannotate-train.log 
[03/14/23 19:56:33]: /venv/bin/funannotate train -i DTO4.secondpolish.funsorted.masked.fasta -o funannotate.out -l /path/to/my/illumina.rna.reads/treatment1_1_R1.fastq.gz -r /path/to/my/illumina.rna.reads/treatment1_1_R2.fastq.gz --cpus 20

[03/14/23 19:56:33]: OS: Debian GNU/Linux 10, 256 cores, ~ 1056 GB RAM. Python: 3.8.12
[03/14/23 19:56:33]: Running 1.8.14
[03/14/23 19:56:37]: fasta version=36.3.8g path=/venv/bin/fasta
[03/14/23 19:56:37]: minimap2 version=2.24-r1122 path=/venv/bin/minimap2
[03/14/23 19:56:37]: hisat2 version=2.2.1 path=/venv/bin/hisat2
[03/14/23 19:56:37]: hisat2-build version=NA path=/venv/bin/hisat2-build
[03/14/23 19:56:37]: Trinity version=2.8.5 path=/venv/bin/Trinity
[03/14/23 19:56:37]: java version=11.0.8-internal path=/venv/bin/java
[03/14/23 19:56:37]: kallisto version=0.46.1 path=/venv/bin/kallisto
[03/14/23 19:56:37]: /venv/opt/pasa-2.4.1/Launch_PASA_pipeline.pl version=NA path=/venv/opt/pasa-2.4.1/Launch_PASA_pipeline.pl
[03/14/23 19:56:37]: /venv/opt/pasa-2.4.1/bin/seqclean version=NA path=/venv/opt/pasa-2.4.1/bin/seqclean
[03/14/23 19:56:37]: trimmomatic version=0.39 path=/venv/bin/trimmomatic
[03/14/23 19:56:37]: minimap2 version=2.24-r1122 path=/venv/bin/minimap2
[03/14/23 19:56:37]: blat version=BLAT v35 path=/venv/bin/blat
[03/14/23 19:56:41]: Input reads: ('/path/to/my/illumina.rna.reads/treatment1_1_R1.fastq.gz', '/path/to/my/illumina.rna.reads/treatment1_1_R2.fastq.gz', None)
[03/14/23 19:56:41]: Adapter and Quality trimming PE reads with Trimmomatic
[03/14/23 19:56:41]: trimmomatic PE -threads 20 -phred33 /path/to/my/illumina.rna.reads/treatment1_1_R1.fastq.gz /path/to/my/illumina.rna.reads/treatment1_1_R2.fastq.gz funannotate.out/training/trimmomatic/trimmed_left.fastq funannotate.out/training/trimmomatic/trimmed_left.unpaired.fastq funannotate.out/training/trimmomatic/trimmed_right.fastq funannotate.out/training/trimmomatic/trimmed_right.unpaired.fastq ILLUMINACLIP:/venv/lib/python3.8/site-packages/funannotate/config/TruSeq3-PE.fa:2:30:10 SLIDINGWINDOW:4:5 LEADING:5 TRAILING:5 MINLEN:25
[03/14/23 20:24:22]: TrimmomaticPE: Started with arguments:
 -threads 20 -phred33 /path/to/my/illumina.rna.reads/treatment1_1_R1.fastq.gz /path/to/my/illumina.rna.reads/treatment1_1_R2.fastq.gz funannotate.out/training/trimmomatic/trimmed_left.fastq funannotate.out/training/trimmomatic/trimmed_left.unpaired.fastq funannotate.out/training/trimmomatic/trimmed_right.fastq funannotate.out/training/trimmomatic/trimmed_right.unpaired.fastq ILLUMINACLIP:/venv/lib/python3.8/site-packages/funannotate/config/TruSeq3-PE.fa:2:30:10 SLIDINGWINDOW:4:5 LEADING:5 TRAILING:5 MINLEN:25
Using PrefixPair: 'TACACTCTTTCCCTACACGACGCTCTTCCGATCT' and 'GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT'
ILLUMINACLIP: Using 1 prefix pairs, 0 forward/reverse sequences, 0 forward only sequences, 0 reverse only sequences
Input Read Pairs: 53915196 Both Surviving: 49304442 (91.45%) Forward Only Surviving: 4517227 (8.38%) Reverse Only Surviving: 0 (0.00%) Dropped: 93527 (0.17%)
TrimmomaticPE: Completed successfully

[03/14/23 20:24:22]: pigz -f -p 20 funannotate.out/training/trimmomatic/trimmed_left.fastq
[03/14/23 20:27:15]: pigz -f -p 20 funannotate.out/training/trimmomatic/trimmed_left.unpaired.fastq
[03/14/23 20:27:24]: pigz -f -p 20 funannotate.out/training/trimmomatic/trimmed_right.fastq
[03/14/23 20:29:36]: pigz -f -p 20 funannotate.out/training/trimmomatic/trimmed_right.unpaired.fastq
[03/14/23 20:29:36]: Quality trimmed reads: ('funannotate.out/training/trimmomatic/trimmed_left.fastq.gz', 'funannotate.out/training/trimmomatic/trimmed_right.fastq.gz', None)
[03/14/23 20:29:36]: FASTQ headers seem compatible with Trinity
[03/14/23 20:29:36]: Running read normalization with Trinity
[03/14/23 20:29:36]: /venv/opt/trinity-2.8.5/util/insilico_read_normalization.pl --PARALLEL_STATS --JM 50G --min_cov 5 --max_cov 50 --seqType fq --output funannotate.out/training/normalize --CPU 20 --pairs_together --left funannotate.out/training/trimmomatic/trimmed_left.fastq.gz --right funannotate.out/training/trimmomatic/trimmed_right.fastq.gz
[03/14/23 21:23:43]: Normalized reads: ('funannotate.out/training/normalize/left.norm.fq', 'funannotate.out/training/normalize/right.norm.fq', None)
[03/14/23 21:23:43]: Long reads: (None, None, None)
[03/14/23 21:23:43]: Long reads FASTA format: (None, None, None)
[03/14/23 21:23:43]: Long SeqCleaned reads: (None, None, None)
[03/14/23 22:23:45]: Running StringTie on Hisat2 coordsorted BAM
[03/14/23 22:23:45]: stringtie -p 20 funannotate.out/training/hisat2.coordSorted.bam
[03/14/23 22:24:00]: Removing poly-A sequences from trinity transcripts using seqclean
[03/14/23 22:24:00]: /venv/opt/pasa-2.4.1/bin/seqclean trinity.fasta -c 16
[03/14/23 22:24:09]: seqclean running options: 
seqclean trinity.fasta -c 16
 Standard log file: seqcl_trinity.fasta.log
 Error log file:    err_seqcl_trinity.fasta.log
 Using 16 CPUs for cleaning
-= Rebuilding trinity.fasta cdb index =-
 Launching actual cleaning process:
 psx -p 16  -n 1000  -i trinity.fasta -d cleaning -C '/path/to/my/assembly/fasta/trinity.fasta:ANLMS100:::11:0' -c '/venv/opt/pasa-2.4.1/bin/seqclean.psx'
Collecting cleaning reports

**************************************************
Sequences analyzed:     24150
-----------------------------------
                   valid:     24150  (220 trimmed)
                 trashed:         0
**************************************************
Output file containing only valid and trimmed sequences: trinity.fasta.clean
For trimming and trashing details see cleaning report  : trinity.fasta.cln
--------------------------------------------------
seqclean (trinity.fasta) finished on machine cn1615
 in /path/to/my/assembly/fasta, without a detectable error.

[03/14/23 22:24:10]: minimap2 -ax splice -t 20 --cs -u b -G 3000 funannotate.out/training/genome.fasta funannotate.out/training/trinity.fasta.clean | samtools sort --reference funannotate.out/training/genome.fasta -@ 4 -o funannotate.out/training/trinity.alignments.bam -
[03/14/23 22:24:15]: Converting transcript alignments to GFF3 format
[03/14/23 22:24:16]: Converting Trinity transcript alignments to GFF3 format
[03/14/23 22:24:17]: Running PASA alignment step using 24,150 transcripts
[03/14/23 22:24:17]: /venv/opt/pasa-2.4.1/Launch_PASA_pipeline.pl -c /path/to/my/assembly/fasta/funannotate.out/training/pasa/alignAssembly.txt -r -C -R -g /path/to/my/assembly/fasta/funannotate.out/training/genome.fasta --IMPORT_CUSTOM_ALIGNMENTS /path/to/my/assembly/fasta/funannotate.out/training/trinity.alignments.gff3 -T -t /path/to/my/assembly/fasta/funannotate.out/training/trinity.fasta.clean -u /path/to/my/assembly/fasta/funannotate.out/training/trinity.fasta --stringent_alignment_overlap 30.0 --TRANSDECODER --ALT_SPLICE --MAX_INTRON_LENGTH 3000 --CPU 20 --ALIGNERS blat --trans_gtf /path/to/my/assembly/fasta/funannotate.out/training/funannotate_train.stringtie.gtf
[03/15/23 07:09:22]: CMD ERROR: /venv/opt/pasa-2.4.1/Launch_PASA_pipeline.pl -c /path/to/my/assembly/fasta/funannotate.out/training/pasa/alignAssembly.txt -r -C -R -g /path/to/my/assembly/fasta/funannotate.out/training/genome.fasta --IMPORT_CUSTOM_ALIGNMENTS /path/to/my/assembly/fasta/funannotate.out/training/trinity.alignments.gff3 -T -t /path/to/my/assembly/fasta/funannotate.out/training/trinity.fasta.clean -u /path/to/my/assembly/fasta/funannotate.out/training/trinity.fasta --stringent_alignment_overlap 30.0 --TRANSDECODER --ALT_SPLICE --MAX_INTRON_LENGTH 3000 --CPU 20 --ALIGNERS blat --trans_gtf /path/to/my/assembly/fasta/funannotate.out/training/funannotate_train.stringtie.gtf

OS/Install Information

Singularity> funannotate check --show-versions
-------------------------------------------------------
Checking dependencies for 1.8.14
-------------------------------------------------------
You are running Python v 3.8.12. Now checking python packages...
biopython: 1.79
goatools: 1.2.3
matplotlib: 3.7.0
natsort: 8.2.0
numpy: 1.22.4
pandas: 1.5.3
psutil: 5.9.4
requests: 2.28.2
scikit-learn: 1.1.1
scipy: 1.5.3
seaborn: 0.12.2
All 11 python packages installed

You are running Perl v b'5.026002'. Now checking perl modules...
Carp: 1.38
Clone: 0.42
DBD::SQLite: 1.64
DBD::mysql: 4.046
DBI: 1.642
DB_File: 1.855
Data::Dumper: 2.173
File::Basename: 2.85
File::Which: 1.23
Getopt::Long: 2.5
Hash::Merge: 0.300
JSON: 4.02
LWP::UserAgent: 6.39
Logger::Simple: 2.0
POSIX: 1.76
Parallel::ForkManager: 2.02
Pod::Usage: 1.69
Scalar::Util::Numeric: 0.40
Storable: 3.15
Text::Soundex: 3.05
Thread::Queue: 3.12
Tie::File: 1.02
URI::Escape: 3.31
YAML: 1.29
local::lib: 2.000029
threads: 2.15
threads::shared: 1.56
All 27 Perl modules installed

Checking Environmental Variables...
$FUNANNOTATE_DB=/opt/databases
$PASAHOME=/venv/opt/pasa-2.4.1
$TRINITYHOME=/venv/opt/trinity-2.8.5
$EVM_HOME=/venv/opt/evidencemodeler-1.1.1
$AUGUSTUS_CONFIG_PATH=/usr/share/augustus/config
    ERROR: GENEMARK_PATH not set. export GENEMARK_PATH=/path/to/dir
-------------------------------------------------------
Checking external dependencies...
    ERROR: pslDnaFiler found but error running: pslCDnaFilter: error while loading shared libraries: libssl.so.1.0.0: cannot open shared object file: No such file or directory

PASA: 2.4.1
CodingQuarry: 2.0
Trinity: 2.8.5
augustus: 3.3.2
bamtools: bamtools 2.5.2
bedtools: bedtools v2.30.0
blat: BLAT v35
diamond: 2.0.15
ete3: 3.1.2
exonerate: exonerate 2.4.0
fasta: 36.3.8g
glimmerhmm: 3.0.4
gmap: 2017-11-15
hisat2: 2.2.1
hmmscan: HMMER 3.3.2 (Nov 2020)
hmmsearch: HMMER 3.3.2 (Nov 2020)
java: 11.0.8-internal
kallisto: 0.46.1
mafft: v7.515 (2023/Jan/15)
makeblastdb: makeblastdb 2.2.31+
minimap2: 2.24-r1122
pigz: 2.6
proteinortho: 6.0.16
salmon: salmon 0.14.1
samtools: samtools 1.12
snap: 2006-07-28
stringtie: 2.2.1
tRNAscan-SE: 2.0.9 (July 2021)
tantan: tantan 40
tbl2asn: 25.8
tblastn: tblastn 2.2.31+
trimal: trimAl v1.4.rev15 build[2013-12-17]
trimmomatic: 0.39
    ERROR: emapper.py not installed
    ERROR: gmes_petap.pl not installed
    ERROR: pslCDnaFilter not installed
    ERROR: signalp not installed
Singularity>
DaRinker commented 1 year ago

UPDATE: I re-ran my script (using whatever pre-existing output funannotate could find) and everything seemed to work just fine (??).

Starting singularity...
Funannotate-prepped genome file found..
##########
Running command:

funannotate train -i DTO4.secondpolish.funsorted.masked.fasta -o funannotate.out -l /path/to/my/illumina.rna.reads/treatment1_R1.fastq.gz -r /path/to/my/illumina.rna.reads/treatment1_R2.fastq.gz --cpus 20
#########
[Mar 15 07:53 AM]: OS: Debian GNU/Linux 10, 256 cores, ~ 792 GB RAM. Python: 3.8.12
[Mar 15 07:53 AM]: Running 1.8.14
[Mar 15 07:53 AM]: 24,150 existing Trinity results found: funannotate.out/training/trinity.fasta
[Mar 15 07:53 AM]: Removing poly-A sequences from trinity transcripts using seqclean
[Mar 15 07:53 AM]: Existing SeqClean output found: funannotate.out/training/funannotate.out/training/trinity.fasta.clean
[Mar 15 07:53 AM]: Existing BAM alignments found: funannotate.out/training/trinity.alignments.bam, funannotate.out/training/transcript.alignments.bam
[Mar 15 07:53 AM]: Existing PASA assemblies found: funannotate.out/training/pasa/DTO4_secondpolish_funsorted_masked_pasa.assemblies.fasta
[Mar 15 07:53 AM]: PASA assigned 24,256 transcripts to 16,567 loci (genes)
[Mar 15 07:53 AM]: Getting PASA models for training with TransDecoder
[Mar 15 08:03 AM]: PASA finished. PASAweb accessible via: localhost:port/cgi-bin/index.cgi?db=/path/to/my/assembly/fasta/funannotate.out/training/p
asa/DTO4_secondpolish_funsorted_masked_pasa
[Mar 15 08:03 AM]: Using Kallisto TPM data to determine which PASA gene models to select at each locus
[Mar 15 08:03 AM]: Building Kallisto index
[Mar 15 08:04 AM]: Mapping reads using pseudoalignment in Kallisto
[Mar 15 08:10 AM]: Parsing expression value results. Keeping best transcript at each locus.
[Mar 15 08:11 AM]: Wrote 8,927 PASA gene models
[Mar 15 08:11 AM]: PASA database name: DTO4.secondpolish.funsorted.masked
[Mar 15 08:11 AM]: Trinity/PASA has completed, you are now ready to run funanotate predict, for example:

  funannotate predict -i DTO4.secondpolish.funsorted.masked.fasta \
            -o funannotate.out -s "DTO4.secondpolish.funsorted.masked" --cpus 20

-------------------------------------------------------
-------------------------------------------------------
##########
Running command:

funannotate predict -i DTO4.secondpolish.funsorted.masked.fasta --species "my_species" --isolate DTO4 --transcript_evidence funannotate.out/training/funannotate_train.trinity-GG.fasta --rna_bam funannot
ate.out/training/funannotate_train.coordSorted.bam --pasa_gff funannotate.out/training/funannotate_train.pasa.gff3 --out funannotate.out
#########
-------------------------------------------------------
[Mar 15 08:11 AM]: OS: Debian GNU/Linux 10, 256 cores, ~ 792 GB RAM. Python: 3.8.12
[Mar 15 08:11 AM]: Running funannotate v1.8.14
[Mar 15 08:11 AM]: GeneMark not found and $GENEMARK_PATH environmental variable missing. Will skip GeneMark ab-initio prediction.
[Mar 15 08:11 AM]: Found training files, will re-use these files:
  --stringtie funannotate.out/training/funannotate_train.stringtie.gtf
  --transcript_alignments funannotate.out/training/funannotate_train.transcripts.gff3
[Mar 15 08:11 AM]: Parsed training data, run ab-initio gene predictors as follows:
  Program        Training-Method
  augustus       pasa           
  codingquarry   rna-bam        
  glimmerhmm     pasa           
  snap           pasa           
[Mar 15 08:11 AM]: Loading genome assembly and parsing soft-masked repetitive sequences
[Mar 15 08:11 AM]: Genome loaded: 24 scaffolds; 31,699,275 bp; 2.56% repeats masked
[Mar 15 08:11 AM]: Parsed 16,285 transcript alignments from: funannotate.out/training/funannotate_train.transcripts.gff3
[Mar 15 08:12 AM]: Aligning 7,865 unique transcripts [not found in exising alignments] with minimap2
[Mar 15 08:12 AM]: Mapped 0 of these transcripts to the genome
[Mar 15 08:12 AM]: Creating transcript EVM alignments and Augustus transcripts hintsfile
[Mar 15 08:12 AM]: Extracting hints from RNA-seq BAM file using bam2hints
[Mar 15 08:12 AM]: Mapping 555,918 proteins to genome using diamond and exonerate

Could problem be with my shell script? This is how I have it set up:

######################################

cat <<EOF
##########
Running command:

funannotate train -i ${genomefasta%.fa*}.funsorted.masked.fasta \
-o funannotate.out \
-l ${rnaseqreadspath}/${fwdrnaseq} \
-r ${rnaseqreadspath}/${revrnaseq} \
--cpus 20
#########
EOF

funannotate train -i ${genomefasta%.fa*}.funsorted.masked.fasta \
-o funannotate.out \
-l ${rnaseqreadspath}/${fwdrnaseq} \
-r ${rnaseqreadspath}/${revrnaseq} \
--cpus 20

if [ $? -eq 0 ] 
then 
  echo "Successfully completed 'funannotate train' on RNAseq data" >&2
else 
  echo "Could not complete 'funannotate train'. Exiting..." >&2
  exit 1
fi

cat <<EOF
##########
Running command:

funannotate predict -i ${genomefasta%.fa*}.funsorted.masked.fasta \
--species "aspergillus_fischeri" --isolate ${sample} \
--transcript_evidence funannotate.out/training/funannotate_train.trinity-GG.fasta \
--rna_bam funannotate.out/training/funannotate_train.coordSorted.bam \
--pasa_gff funannotate.out/training/funannotate_train.pasa.gff3 \
--out funannotate.out
#########
EOF

funannotate predict -i ${genomefasta%.fa*}.funsorted.masked.fasta \
--species "aspergillus_fischeri" --isolate ${sample} \
--transcript_evidence funannotate.out/training/funannotate_train.trinity-GG.fasta \
--rna_bam funannotate.out/training/funannotate_train.coordSorted.bam \
--pasa_gff funannotate.out/training/funannotate_train.pasa.gff3 \
--out funannotate.out
hyphaltip commented 1 year ago

you are running the sqlite version to generate the db - might be that the path loaded is diff when you run interactively vs direct?

might be good to test again with a clean folder if you want to try to make sure this won't be back on your next annotation.

DaRinker commented 1 year ago

might be that the path loaded is diff when you run interactively vs direct?

Interesting. I would have guessed that the same version would be used since both are run inside the container. Not exactly sure how to remedy.

On Wed, Mar 15, 2023 at 12:48 PM Jason Stajich @.***> wrote:

you are running the sqlite version to generate the db - might be that the path loaded is diff when you run interactively vs direct?

might be good to test again with a clean folder if you want to try to make sure this won't be back on your next annotation.

— Reply to this email directly, view it on GitHub https://github.com/nextgenusfs/funannotate/issues/882#issuecomment-1470389320, or unsubscribe https://github.com/notifications/unsubscribe-auth/AC4HAMKY24JVVERLLU72TY3W4HXGLANCNFSM6AAAAAAV3YRPLQ . You are receiving this because you authored the thread.Message ID: @.***>

DaRinker commented 1 year ago

As an update, this is still a problem for me. However, I have since discovered that I can also get the annotation to complete simply by resubmitting the same job to the cluster.

For example, this is my failed job:

[Mar 18 09:12 AM]: Running StringTie on Hisat2 coordsorted BAM
[Mar 18 09:12 AM]: Removing poly-A sequences from trinity transcripts using seqclean
[Mar 18 09:13 AM]: Converting transcript alignments to GFF3 format
[Mar 18 09:13 AM]: Converting Trinity transcript alignments to GFF3 format
[Mar 18 09:13 AM]: Running PASA alignment step using 28,314 transcripts
[Mar 18 11:41 PM]: CMD ERROR: /venv/opt/pasa-2.4.1/Launch_PASA_pipeline.pl -c path/to/assembly/DTO1/pilonresults/funannotate.out/training/pasa/alignAssembly.txt -r -C -R -g path/to/assembly/DTO1/pilonresults/funannotate.out/training/genome.fasta --IMPORT_CUSTOM_ALIGNMENTS path/to/assembly/DTO1/pilonresults/funannotate.out/training/trinity.alignments.gff3 -T -t path/to/assembly/DTO1/pilonresults/funannotate.out/training/trinity.fasta.clean -u path/to/assembly/DTO1/pilonresults/funannotate.out/training/trinity.fasta --stringent_alignment_overlap 30.0 --TRANSDECODER --ALT_SPLICE --MAX_INTRON_LENGTH 3000 --CPU 20 --ALIGNERS blat --trans_gtf path/to/assembly/DTO1/pilonresults/funannotate.out/training/funannotate_train.stringtie.gtf

Which I can then relaunch:

[Mar 19 09:17 AM]: OS: Debian GNU/Linux 10, 256 cores, ~ 1056 GB RAM. Python: 3.8.12
[Mar 19 09:17 AM]: Running 1.8.14
[Mar 19 09:17 AM]: 28,314 existing Trinity results found: funannotate.out/training/trinity.fasta
[Mar 19 09:17 AM]: Removing poly-A sequences from trinity transcripts using seqclean
[Mar 19 09:17 AM]: Existing SeqClean output found: funannotate.out/training/funannotate.out/training/trinity.fasta.clean
[Mar 19 09:17 AM]: Existing BAM alignments found: funannotate.out/training/trinity.alignments.bam, funannotate.out/training/transcript.alignments.bam
[Mar 19 09:17 AM]: Existing PASA assemblies found: funannotate.out/training/pasa/DTO1_secondpolish_funsorted_masked_pasa.assemblies.fasta
[Mar 19 09:17 AM]: PASA assigned 25,684 transcripts to 13,715 loci (genes)
[Mar 19 09:17 AM]: Getting PASA models for training with TransDecoder
[Mar 19 09:30 AM]: PASA finished. PASAweb accessible via: localhost:port/cgi-bin/index.cgi?db=path/to/assembly/DTO1/pilonresults
/funannotate.out/training/pasa/DTO1_secondpolish_funsorted_masked_pasa
[Mar 19 09:30 AM]: Using Kallisto TPM data to determine which PASA gene models to select at each locus
[Mar 19 09:30 AM]: Building Kallisto index
[Mar 19 09:32 AM]: Mapping reads using pseudoalignment in Kallisto
[Mar 19 09:34 AM]: Parsing expression value results. Keeping best transcript at each locus.

So my workflow now is 1) submit job 2) wait for job to crash with PASA the error documented above 3) RE-submit the same job script 4) Wait for job to complete successfully.

However, I am now concerned that the PASA output being used in the second attempt (25,684 transcripts ) doesn't match the input (28,314 transcripts) from the first (failed) attempt. I had been assuming that the number of transcripts "assigned" should be a subset of the starting number so was okay with the discrepancy...but now I'm feeling like I might be assuming too much