wurmlab / flo

Same species annotation lift over pipeline.
96 stars 28 forks source link

Error when blat begins #36

Open 14zac2 opened 3 years ago

14zac2 commented 3 years ago

Hello!

I am experiencing an error with Flo that I was hoping you might be able to help me with. Rake aborts when blat begins, I believe. I searched for this issue and found it has happened for a few others, and noticed that one piece of advice was to update parallel. I did that through a conda environment and am now using GNU parallel 20201122, which I realize is still GNU parallel. Nonetheless, I am getting the following error:

mkdir run cp /path/genomic.fa run/source.fa cp /path/target.fa faToTwoBit run/source.fa run/source.2bit faToTwoBit run/target.fa run/target.2bit twoBitInfo run/source.2bit stdout | sort -k2nr > run/source.sizes twoBitInfo run/target.2bit stdout | sort -k2nr > run/target.sizes faSplit sequence run/target.fa 20 run/chunk_ parallel --joblog run/joblog.faSplit -j 20 -a run/joblst.faSplit 29164 pieces of 29164 written 26770 pieces of 26770 written 23287 pieces of 23287 written 25387 pieces of 25387 written 25525 pieces of 25525 written 25448 pieces of 25448 written 25555 pieces of 25555 written 26474 pieces of 26474 written 25992 pieces of 25992 written 27046 pieces of 27046 written 26153 pieces of 26153 written 26728 pieces of 26728 written 26526 pieces of 26526 written 27621 pieces of 27621 written 26588 pieces of 26588 written 26266 pieces of 26266 written 26387 pieces of 26387 written 25897 pieces of 25897 written 27300 pieces of 27300 written 25692 pieces of 25692 written parallel --joblog run/joblog.blat -j 20 -a run/joblst.blat Loaded 2510587379 letters in 14543 sequences Searched 116386128 bases in 23287 sequences Loaded 2510587379 letters in 14543 sequences Searched 127615767 bases in 25555 sequences Loaded 2510587379 letters in 14543 sequences Searched 127103007 bases in 25448 sequences Loaded 2510587379 letters in 14543 sequences Searched 132315417 bases in 26474 sequences Loaded 2510587379 letters in 14543 sequences Searched 132646279 bases in 26588 sequences Loaded 2510587379 letters in 14543 sequences Searched 129744868 bases in 25992 sequences Loaded 2510587379 letters in 14543 sequences Searched 134829202 bases in 27046 sequences Loaded 2510587379 letters in 14543 sequences Searched 130548735 bases in 26153 sequences Loaded 2510587379 letters in 14543 sequences Searched 126148018 bases in 25387 sequences Loaded 2510587379 letters in 14543 sequences Searched 133429151 bases in 26728 sequences Loaded 2510587379 letters in 14543 sequences Searched 145757029 bases in 29164 sequences Loaded 2510587379 letters in 14543 sequences Searched 137556497 bases in 27621 sequences Loaded 2510587379 letters in 14543 sequences Searched 131717084 bases in 26387 sequences Loaded 2510587379 letters in 14543 sequences Searched 130661225 bases in 26266 sequences Loaded 2510587379 letters in 14543 sequences Searched 128121430 bases in 25692 sequences Loaded 2510587379 letters in 14543 sequences Searched 126631904 bases in 25525 sequences Loaded 2510587379 letters in 14543 sequences Searched 132127474 bases in 26770 sequences rake aborted! Command failed with status (3): [parallel --joblog run/joblog.blat -j 20 -a...] /home/user/bin/flo/Rakefile:161:in parallel' /home/user/bin/flo/Rakefile:107:inblock in <top (required)>' /home/user/bin/flo/Rakefile:37:in `block in <top (required)>' Tasks: TOP => run/liftover.chn (See full trace by running task with --trace)

My job log looks like this: 9 : 1606275293.957 28015.902 0 89 0 0 blat -noHead -fastMap -tileSize=12 -minIdentity=95 run/source.fa run/chunk_11.fa rujoblog.blat Seq Host Starttime JobRuntime Send Receive Exitval Signal Command 18 : 1606275293.989 45.953 0 0 0 9 blat -noHead -fastMap -tileSize=12 -minIdentity=95 run/source.fa run/chunk_06.fa run/chunk_06.fa.psl 17 : 1606275293.975 46.491 0 0 0 9 blat -noHead -fastMap -tileSize=12 -minIdentity=95 run/source.fa run/chunk_10.fa run/chunk_10.fa.psl 16 : 1606275293.972 872.307 0 0 0 9 blat -noHead -fastMap -tileSize=12 -minIdentity=95 run/source.fa run/chunk_13.fa run/chunk_13.fa.psl 5 : 1606275293.949 19148.322 0 89 0 0 blat -noHead -fastMap -tileSize=12 -minIdentity=95 run/source.fa run/chunk_19.fa run/chunk_19.fa.psl 12 : 1606275293.963 20725.125 0 89 0 0 blat -noHead -fastMap -tileSize=12 -minIdentity=95 run/source.fa run/chunk_14.fa run/chunk_14.fa.psl 7 : 1606275293.953 22003.258 0 89 0 0 blat -noHead -fastMap -tileSize=12 -minIdentity=95 run/source.fa run/chunk_15.fa run/chunk_15.fa.psl 2 : 1606275293.943 23220.606 0 89 0 0 blat -noHead -fastMap -tileSize=12 -minIdentity=95 run/source.fa run/chunk_18.fa run/chunk_18.fa.psl 11 : 1606275293.961 23502.038 0 89 0 0 blat -noHead -fastMap -tileSize=12 -minIdentity=95 run/source.fa run/chunk_04.fa run/chunk_04.fa.psl 8 : 1606275293.955 23821.197 0 89 0 0 blat -noHead -fastMap -tileSize=12 -minIdentity=95 run/source.fa run/chunk_16.fa run/chunk_16.fa.psl 3 : 1606275293.945 24157.292 0 89 0 0 blat -noHead -fastMap -tileSize=12 -minIdentity=95 run/source.fa run/chunk_05.fa run/chunk_05.fa.psl 13 : 1606275293.966 24351.732 0 89 0 0 blat -noHead -fastMap -tileSize=12 -minIdentity=95 run/source.fa run/chunk_01.fa run/chunk_01.fa.psl 1 : 1606275293.941 24630.828 0 89 0 0 blat -noHead -fastMap -tileSize=12 -minIdentity=95 run/source.fa run/chunk_07.fa run/chunk_07.fa.psl 14 : 1606275293.968 24792.986 0 89 0 0 blat -noHead -fastMap -tileSize=12 -minIdentity=95 run/source.fa run/chunk_02.fa run/chunk_02.fa.psl 4 : 1606275293.947 25343.822 0 89 0 0 blat -noHead -fastMap -tileSize=12 -minIdentity=95 run/source.fa run/chunk_17.fa run/chunk_17.fa.psl 6 : 1606275293.951 25997.266 0 89 0 0 blat -noHead -fastMap -tileSize=12 -minIdentity=95 run/source.fa run/chunk_08.fa run/chunk_08.fa.psl 19 : 1606275293.993 26497.434 0 89 0 0 blat -noHead -fastMap -tileSize=12 -minIdentity=95 run/source.fa run/chunk_00.fa run/chunk_00.fa.psl 15 : 1606275293.970 26656.599 0 89 0 0 blat -noHead -fastMap -tileSize=12 -minIdentity=95 run/source.fa run/chunk_09.fa run/chunk_09.fa.psl 20 : 1606275294.002 27909.093 0 89 0 0 blat -noHead -fastMap -tileSize=12 -minIdentity=95 run/source.fa run/chunk_03.fa run/chunk_03.fa.psl 9 : 1606275293.957 28015.902 0 89 0 0 blat -noHead -fastMap -tileSize=12 -minIdentity=95 run/source.fa run/chunk_11.fa run/chunk_11.fa.psl 10 : 1606275293.959 28659.995 0 89 0 0 blat -noHead -fastMap -tileSize=12 -minIdentity=95 run/source.fa run/chunk_12.fa run/chunk_12.fa.psl

Do you have any thoughts as to what might be going on?

Thanks so much, Zoe

yeban commented 3 years ago

I am not sure. How big is the target genome? How many sequences do you have in the target assembly? Are any of the chunks (run/chunk_xx.fa files) empty? Does it work if you set :processes: setting in flo_opts.yaml to 1?

14zac2 commented 3 years ago

My target genome is 2.5 G and is made up of 3123 contigs. What I have done in the meantime is transported my work over to a computer cluster. Flo seems to get a little bit further but is still crashing. Right now, I am getting node failures - I thought this, at first, could have been from a faulty node in the cluster, but different nodes each crash at the same spot so I think it is something else.

What happened, was I began by running flo with a GFF file, and got an error like this one:

{earlier stuff} Processing NW_015365746.1 Processing NW_015365747.1 Processing NW_015365748.1 Processing NW_015365749.1 Processing NW_015365750.1 Processing NW_015365751.1 mkdir run/genomic_transcripts_long liftOver -gff /path/genomic_transcripts_long.gff run/liftover.chn run/genomic_transcripts_long/lifted.gff3 run/genomic_transcripts_long/unlifted.gff3 Reading liftover chains Mapping coordinates WARNING: -gff is not recommended. Use 'ldHgGene -out=' and then 'liftOver -genePred ' /path/flo/gff_recover.rb run/genomic_transcripts_long/lifted.gff3 2> run/genomic_transcripts_long/lifted_cleanup.log | gt gff3 -tidy -sort -addids -retainids - > run/genomic_transcripts_long/lifted_cleaned.gff 2>> run/genomic_transcripts_long/lifted_cleanup.log rake aborted! Command failed with status (1): [/path/flo/gff_recover.rb run/GCF_001...] /path/flo/Rakefile:60:in block (2 levels) in <top (required)>' /path/flo/Rakefile:40:ineach' /path/flo/Rakefile:40:in block in <top (required)>' /path/gems/gems/rake-13.0.1/exe/rake:27:in<top (required)>' Tasks: TOP => default (See full trace by running task with --trace)

I assumed this error indicated that I needed to use a gtf file instead of a gff. So I converted genomic_transcripts_long.gff to genomic_transcripts_long.gtf and am now not getting as far with flo. My system is crashing here:

{earlier stuff} 14072 pieces of 14072 written 14212 pieces of 14212 written parallel --joblog run/joblog.blat -j 48 -a run/joblst.blat

All of my chunk_xx.fa files are populated, but my joblog.blat is empty. I am wondering if my system is running out of memory? I think next I will try to run flo on two nodes with the same number of parallel processors. Do you have any recommendations? I'm just confused as to where and why the job is failing on me.

Also, with flo failing multiple times, is there any way to place checkpoints in partial runs so that one doesn't have to start from scratch?

Thanks again, Zoe

14zac2 commented 3 years ago

Sorry, I realize I think I found an identical error in this other issue post: https://github.com/wurmlab/flo/issues/15. I posted there, as I took the advice mentioned. Happy to then close this issue.

14zac2 commented 3 years ago

Hi @yeban - I heeded your advice to avoid spamming another issue thread, and reopened my original one.

You mentioned that the error the multi-feature with ID "cds-XP_015332030.1" on line 301782 in file "processed.gff" has a different strand than its counterpart on line 301780 (possible in rare cases) is new. I ran gt on the original file to see if the issue was coming from it. Part of the problem was the same: there were certain mRNA features that had a Parent=gene description which, of course, pointed to a gene that didn't exist because I had filtered them out. However, once I removed the pesky Parent=gene descriptors, there were no further warnings or errors. I can run flo again on the fixed original file to see if it completes successfully.

As for why this problem occurred with my original files, I'm not sure. I was lifting over a RefSeq annotation from NCBI and filtered the gff file the following way:

gt gff3 -tidy -sort -addids -retainids original_genomic.gff > genomic_sorted_and_tidied.gff ~/flo/gff_longest_transcripts.rb genomic_sorted_and_tidied.gff > genomic_transcripts_sorted_and_tidied_long.gff

The result didn't seem to contain anything other than mRNA, CDS, or exon features, but perhaps there was some sort of bug in one of these preprocessing steps.

yeban commented 3 years ago

Hi @14zac2 - thanks for following up.

Do I understand correctly that you had to remove Parent=gene descriptor after running gff_longest_transcripts.rb script? I think this is something the script should automatically take care of, so there might be a bug to fix here.

Did you get a chance to run flo on the fixed original file to see if it produced the the multi-feature with ID "cds-XP_015332030.1" on line 301782 in file "processed.gff" has a different strand than its counterpart on line 301780 (possible in rare cases) warning again? If it does, that would suggest that lifOver is causing different exons of a transcript are mapping to different strands. I think gff_recover.rb script should then remove such annotations.