waveygang / wfmash

base-accurate DNA sequence alignments using WFA and mashmap3
MIT License
174 stars 17 forks source link

Parameters for large plant genome #202

Open danielle-khost opened 10 months ago

danielle-khost commented 10 months ago

Hi there! I was looking for some feedback using wfmash to align some large (>6gb) plant genomes that are super repeat-dense, around 90%. I was able to make an alignment between species successfully (though it took quite a bit of RAM, around 500Gb), however even for my best assembly I was only able to align around 600Mb of sequence. I was wondering if there were any parameters or setting you could recommend to improve alignment? Currently I had experimented with the -s and -c settings, oddly increasing -s resulted in less alignment than default. Increasing the -c parameter to 100k seemed to give me my best assembly, though I am not sure if setting it that high is a good idea?

Thanks for any help you can give! These genomes are quite cumbersome, so it might be that this alignment is the best I can manage :)

-Danielle

ekg commented 10 months ago

Hi @danielle-khost! What alignment parameters did you give wfmash?

I would try this: wfmash -p 70 -m to get mappings. See how much length they cover. You can take a look at these and either re-run or feed them into wfmash with -i.

Once the mappings make sense and are sufficiently sensitive, you can go to align them.