nextgenusfs / funannotate

Eukaryotic Genome Annotation Pipeline
http://funannotate.readthedocs.io
BSD 2-Clause "Simplified" License
320 stars 84 forks source link

PASA silently failing? #413

Closed devonorourke closed 3 years ago

devonorourke commented 4 years ago

Hi Jon, I wanted to switch up the max_intron_len parameter in the training/prediction parts from the default setting (3000) to a larger value (25000). However, when running the exact same set of commands that had previously completed in about 6 hours, the new job was going for about 18 hours when I looked at a top command to notice that nothing was happening under the hood. I noticed in at least one other thread that others have modified this code and didn't see anyone raise an issue about it. I'm curious what troubleshooting steps you'd advise to investigate. Thanks

devonorourke commented 4 years ago

The specific step that is executed, then is hanging, is this one:

/scratch/dro49/conda/envs/funenv/opt/pasa-2.4.1/Launch_PASA_pipeline.pl \
-c /scratch/dro49/mysework/annotation/funruns/SEfun1/training/pasa/alignAssembly.txt \
-r -C -R -g /scratch/dro49/mysework/annotation/funruns/SEfun1/training/genome.fasta \
--IMPORT_CUSTOM_ALIGNMENTS /scratch/dro49/mysework/annotation/funruns/SEfun1/training/trinity.alignments.gff3 \
-T -t /scratch/dro49/mysework/annotation/funruns/SEfun1/training/trinity.fasta.clean \
-u /scratch/dro49/mysework/annotation/funruns/SEfun1/training/trinity.fasta --stringent_alignment_overlap 30.0 \
--TRANSDECODER --ALT_SPLICE --MAX_INTRON_LENGTH 10000 --CPU 12 --ALIGNERS blat --trans_gtf /scratch/dro49/mysework/annotation/funruns/SEfun1/training/funannotate_train.stringtie.gtf

Here's the weird part: Switching up the max_intron_len parameter to 10,000 worked for one of the two bat genomes. It completed through the entire train.py script. The above command, when it completes successfully, takes about 6 hours. However, the job that is failing (I think) is taking more than 18 hours and hasn't proceeded to the next phase, which in my .log file looks something like:

[some date]: PASA assigned ...

I don't see that message in the job that is hanging, and when I look at node the job is running at, it seems like lots of processes are still open, but not really doing much:

   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
103791 dro49     20   0 14144 1908  976 R  0.7  0.0   0:00.05 top
 63309 dro49     20   0     0    0    0 Z  0.0  0.0   0:00.68 pigz <defunct>
 66916 dro49     20   0     0    0    0 Z  0.0  0.0   0:00.67 pigz <defunct>
 70536 dro49     20   0  116m 6180  804 S  0.0  0.0   0:00.06 perl
 80452 dro49     20   0  103m 1376 1136 S  0.0  0.0   0:00.00 sh
 80453 dro49     20   0 3473m 2.3g  764 S  0.0  1.9   3:59.87 perl
103761 dro49     20   0  126m 2816  892 S  0.0  0.0   0:00.00 sshd
103762 dro49     20   0  104m 1964 1460 S  0.0  0.0   0:00.00 bash
126843 dro49     20   0  103m 1624 1244 S  0.0  0.0   0:00.05 slurm_script
127494 dro49     20   0  155m  29m  948 S  0.0  0.0   0:44.91 funannotate
127864 dro49     20   0 19452  920  624 S  0.0  0.0   0:00.00 pigz
127865 dro49     20   0 19452  920  624 S  0.0  0.0   0:00.00 pigz
nextgenusfs commented 4 years ago

So you are saying it seems that PASA is getting stuck? Anything different about these two species (config names, RNA-seq data, coverage, etc?). Could it have run out of memory? Hard to imagine how the same settings would result in one stalling and the other completing??

devonorourke commented 4 years ago

It hopefully is some dumb thing on my part. The job that PASA is getting stuck on is the bat genome I've already completed an initial run with, so definitely not anything wrong with fasta/fastq header names. I haven't picked up any OOM events from our cluster in any log files. For the moment, I'm just running the command that it didn't finish directly and will check out the log file for any hints there (hopefully something will come of this):

/scratch/dro49/conda/envs/funenv/opt/pasa-2.4.1/Launch_PASA_pipeline.pl -c /scratch/dro49/myluwork/annotation/fun2/funR2/training/pasa/alignAssembly.txt -r -C -R -g /scratch/dro49/myluwork/annotation/fun2/funR2/training/genome.fasta --IMPORT_CUSTOM_ALIGNMENTS /scratch/dro49/myluwork/annotation/fun2/funR2/training/trinity.alignments.gff3 -T -t /scratch/dro49/myluwork/annotation/fun2/funR2/training/trinity.fasta.clean -u /scratch/dro49/myluwork/annotation/fun2/funR2/training/trinity.fasta --stringent_alignment_overlap 30.0 --TRANSDECODER --ALT_SPLICE --MAX_INTRON_LENGTH 10000 --CPU 24 --ALIGNERS blat --trans_gtf /scratch/dro49/myluwork/annotation/fun2/funR2/training/funannotate_train.stringtie.gtf

Thanks for all the replies!

nextgenusfs commented 4 years ago

You are running in a new folder correct? Ie it’s not trying to overwrite or add to existing SQLite database?

devonorourke commented 4 years ago

I've tried both iterations, first using an existing folder where I deleted just the existing pasa directory, and then also just rerunning 'train.py' anew. I'm on a third iteration now where I've tried it running with a bit more memory. It's still adding data to the "_building.ascii_illustrations.out" file within the pasa directory (after about 2 hours of starting the pasa-related contents). I'll keep you posted. This is is a weird one for sure...

On Thu, Apr 23, 2020 at 12:52 PM Jon Palmer notifications@github.com wrote:

You are running in a new folder correct? Ie it’s not trying to overwrite or add to existing SQLite database?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/nextgenusfs/funannotate/issues/413#issuecomment-618513898, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACVKAXHYUIOBFRT3CVO5LR3ROBW6NANCNFSM4MOJVJSA .

-- Devon O'Rourke Postdoctoral researcher, Northern Arizona University Lab of Jeffrey T. Foster - https://fozlab.weebly.com/ twitter: @thesciencedork

devonorourke commented 4 years ago

The latest iteration finished completely, and I am now 100% confused. This is one of those things I'm 99.99% confident was a user error on my side, but I can't find any piece to point to that was the cause of it. Sorry to raise this issue, but in my estimation you can close it. I'm sorry to have bothered raising it in the first place, but perhaps if someone else comes across a similar behavior this post will at least offer some guidance: just delete and retry! (maybe bad guidance, but guidance nonetheless!)

Thanks Jon