mummer4 / mummer

Mummer alignment tool
Artistic License 2.0
434 stars 108 forks source link

speedup 'delta-filter' on very large nucmer delta inputs #127

Open splaisan opened 4 years ago

splaisan commented 4 years ago

Is it somehow possible to split my very large nucmer.delta data into chunks to filter them separately and merge the results back? I compared two assemblies of 700Mb and the .delta is 39GB Now running three days on a single CPU Thanks in advance

delta-filter -1 -i 100 -l 1000 nucmer.delta

tiramisutes commented 1 year ago

The same problem. Is there a good solution?

Is it somehow possible to split my very large nucmer.delta data into chunks to filter them separately and merge the results back? I compared two assemblies of 700Mb and the .delta is 39GB Now running three days on a single CPU Thanks in advance

delta-filter -1 -i 100 -l 1000 nucmer.delta

6YuHao commented 3 months ago

The same problem. Is there a good solution?

Is it somehow possible to split my very large nucmer.delta data into chunks to filter them separately and merge the results back? I compared two assemblies of 700Mb and the .delta is 39GB Now running three days on a single CPU Thanks in advance delta-filter -1 -i 100 -l 1000 nucmer.delta do you have a good solution ?

GGSonoda commented 4 weeks ago

try running with -1 after the other filters: delta-filter -i 100 -l 1000 -1 nucmer.delta This way it will run the 1-1 filter after the identity and length.

splaisan commented 4 weeks ago

thanks @GGSonoda but how does this split the data and speep-up the process, I do not understand the logic of the command.

GGSonoda commented 4 weeks ago

EDIT:Sorry, it is not a solution to split the data, but it may help the command to be executed faster

If I understood correctly, the delta-filter will filter the results using the order of the operations inserted

Reads a delta alignment file from either nucmer or promer and filters the alignments based on the command-line switches, leaving only the desired alignments which are output to stdout in the same delta format as the input. For multiple switches, order of operations is as follows: -i -l -u -q -r -g -m -1. If an alignment is excluded by a preceding operation, it will be ignored by the succeeding operations.

TBH I'm not sure if that is done by default or if you must tell him to do first one filter and then the other. Also It may help to filter by uniqueness (-u). For me a -u 30 reduced the size of the delta file in half, so it may be faster to run the -1 afterwards... Hope it helps

splaisan commented 4 weeks ago

thanks for clarifying, I guess I will need to try :-)