grke / burp

burp - backup and restore program
http://burp.grke.net
Other
483 stars 77 forks source link

Protocol2 checksums: try skipping min block size minus window size #467

Closed grke closed 2 years ago

grke commented 8 years ago

As talked about here: https://moinakg.wordpress.com/tag/rabin-fingerprint/

Try skipping min block size minus window size before starting the rabin algorithm.

grke commented 8 years ago

The fingerprints will be different and everyone will have to start their protocol2 backups from scratch, but it might be worth it.

Sherlock221B commented 7 years ago

Hi Graham,

You can optimize your code by avoiding using the modulus operator. For each block, in the worst case, you are using the operator 4096 times : if( blk->length >= rconf.blk_min && (blk->length == rconf.blk_max || (win->checksum % rconf.blk_avg) == rconf.prime))

This condition is true "(win->checksum % 5000) == 3" if checksum is equal to 3 or 5003 or 10003 or 15003 ... So you can already exclude all values ​​that do not end with the number 3

To optimize your code you could use this condition : if( blk->length >= rconf.blk_min && (blk->length == rconf.blk_max || ((win->checksum & 1) && (win->checksum & 2) && !(win->checksum & 4) && (win->checksum % rconf.blk_avg) == rconf.prime)))

It is not pretty but it should accelerate the function.

Best regards

Sherlock221B commented 7 years ago

Another idea: I think you do not have to calculate the win->checksum as long as blk->length < (rconf.blk_min - rconf.win_size).

Normally, you should be able to use this code :

static int blk_read(void)
{
    unsigned char c;

    for(; gcp<gbuf_end; gcp++)
    {
        c=(unsigned char)*gcp;

        blk->fingerprint = (blk->fingerprint * rconf.prime) + c;
        if(blk->length >= (rconf.blk_min - rconf.win_size))
        {
            win->checksum    = (win->checksum    * rconf.prime) + c
                       - (win->data[win->pos] * rconf.multiplier);
            win->data[win->pos] = c;

            win->pos++;

            if(win->pos == rconf.win_size)
                win->pos=0;
        }

        if(blk->data)
            blk->data[blk->length] = c;
        blk->length++;

        if( blk->length >= rconf.blk_min
         && (blk->length == rconf.blk_max
          || ((win->checksum & 1) && (win->checksum & 2) && !(win->checksum & 4) && (win->checksum % rconf.blk_avg) == rconf.prime)))
        {
            gcp++;
            return 1;
        }
    }
    return 0;
}

Best regards

grke commented 7 years ago

Thanks for your ideas. I have run the first one through all my automatic tests, and it passes.

I don't have a quick way of proving a speed increase right now, but I will try to set something up.

grke commented 2 years ago

Sorry, I am discontinuing work on the protocol 2, which has never been production ready, and it is removed in the latest release of burp.