hillerlab / make_lastz_chains

Portable solution to generate genome alignment chains using lastz
MIT License
44 stars 8 forks source link

alignment is taking too long #19

Closed gunjanpandey closed 11 months ago

gunjanpandey commented 1 year ago

Could you please tell if something is wrong with this command as it is running even after 4 days. It started with running in a cluster occupying a total of around 500 single core jobs but around 125 jobs are still running.

make_chains.py --project_dir OUTPUT --executor slurm --cluster_parameters '-A AB-123456' --executor_partition defq --force_def chicken bird chicken.2bit bird.2bit

MichaelHiller commented 1 year ago

Unmasked repeats is the main problem.

Did you run RepeatModeler 2.0 to get a repeat library and then softmask your query with this library? Same for the reference genome.

Alternatively, try to reduce the chunksizes to get more but smaller jobs.

gunjanpandey commented 1 year ago

Thanks Michael,

I did not mask the genomes. I shall try again after masking. Could you please tell, if these steps are correct for preparing it for input in TOGA

MichaelHiller commented 1 year ago

Almost correct. After RepeatModeler 2.0, pls soft-mask both assemblies with RepeatMasker (we use rmblastn). Then produce the alignment chains. Then run TOGA.

kirilenkobm commented 11 months ago

https://github.com/hillerlab/make_lastz_chains/issues/10 to keep only one open performance-related issue