moinakg / pcompress

A Parallelized Data Deduplication and Compression utility
http://moinakg.github.com/pcompress/
GNU Lesser General Public License v3.0
277 stars 34 forks source link

May be mistake, may be not...it is not clear, please explain.. #47

Open v50110 opened 9 years ago

v50110 commented 9 years ago

In rabin/rabin_dedup.c between line 187 and 208 https://github.com/moinakg/pcompress/blob/master/rabin/rabin_dedup.c#L187 variables 'pow' and 'poly_pow' holds exactly the same value = (power of RAB_POLYNOMIAL_CONST) mod (POLY_MASK+1). This value is 5 bytes in size (restricted by POLY_MASK). 'poly_pow' defined as uint64_t but 'pow' only as 'unsigned int' (which AFAIK 32bit and can hold only 4 bytes) and for it oversizing occurs already on 3rd iteration: ((153191^3) mod 0x10000000000) = 0xA3D1AE8E77. It's by design and I don't understand something or indeed used wrong type for 'pow'?

moinakg commented 9 years ago

The pow and poly_pow are by design. poly_pow is computed only once whereas pow is computed each time based on FP_POLY and value of j. pow can overflow. However thanks for noticing that POLY_MASK is 5 bytes. I do not remember now whether I had intended it to be 4 bytes or 5 bytes. I will have to experiment and check this one.

v50110 commented 9 years ago

Thanks for quick answer! I noticed one more strange thing.. In main chunking loop line 650 in rabin_dedup.c cur_roll_checksum -= out[pushed_out]; Since cur_roll_checksum and out[pushed_out] both defined as uint64_t It is assumed that cur_roll_checksum > out[pushed_out]. And that would be always true if we subtracted the absolute values. But we subtract modules and if val1 > val2 it does not mean that (val1 mod M) > (val2 mod M) too. Thus we have another arithmetic overflow and it indeed occurs. Unfortunately GCC does shows nor error or even warning, simply assigns some value into cur_roll_checksum and goes ahead, but other compilers stops on this and complain about hard error because cur_roll_checksum sometimes takes a negative value.