hillerlab / make_lastz_chains

Portable solution to generate genome alignment chains using lastz
MIT License
44 stars 8 forks source link

error in the step 'lastz #16

Closed DiegoSafian closed 11 months ago

DiegoSafian commented 1 year ago

Hi,

I have been running make_lastz_chain with several species (all closely related sps, >50MYA divergence) and it works great. However, for only 3 species I have had an error in the last jobes (>90% completed) of step 'lastz'. I was wondering if you could identify the error in these particular sps. (I am running all in def.)

g_multiradiatus_picta.txt

Thanks in advance, Diego

MichaelHiller commented 1 year ago

There is no specific error message of why these jobs die. The most common reason is likely runtime or out of memory, because of incomplete genome masking. Did you RepeatModel + RepeatMask your genomes? Can you provide more memory?

@kirilenkobm Is there a way to get more info on the crashed jobs?

DiegoSafian commented 1 year ago

I checked and the genomes are masked. I can provide more memory. I am running in cluster (slurm) with #SBATCH --mem=0 in one node. Should I increase the --chaining_memory to 100000?

MichaelHiller commented 1 year ago

I think, based on the log file, you don't get to the chaining step, as some of the lastz jobs fail (increasing chaining mem is downstream). This typically happens when lastz 'sees' too many seeds from repeats.

If you already repeatModeled (a denovo RepeatModeler run is important) and masked your genome, you could try reducing chunksize (e.g. set --seq1_chunk 50000000 --seq2_chunk 10000000 ). Gives you more but smaller jobs.

DiegoSafian commented 1 year ago

Thanks! It is now running smoothly. As you suggested, I reduced chunksize, and worked pretty well. Thanks again