Open lasarohedvig opened 4 years ago
Are you referring to the example at rolling_hash/chunking_with_mb_hash.c
? This example does assume that there is a minimum chunk size. You don't have to call rolling_hash2_reset()
but the example assumes that the chunk minimum is larger than the window size so that any computation in finding matches up the the minimum would be waisted. For this reason it skips to the min size offset and calls _reset() as this should be less work if min > window width.
The value FINGERPRINT_RET_OTHER
is reserved for error conditions but so far there are no internal conditions that will set this.
I think you have the definition correct. On hit, offset is set to the position that causes the match, or ptr + 1 so the next offset to check, and it can be re-run without incrementing. On max_len, it will return the max offset - 1.
Thanks for clarifying that reset()
is just a shortcut to skip bytes, that a return of MAX
does set the offset and that OTHER
is unused for now. I believe this should be put into the documentation.
Thanks @lasarohedvig for posting. I'll look into clarifying the documentation.
Hi, all. Could you help me clarify a few details regarding the use of
rolling_hash2_run
function?What is the value of the offset if the function returns
FINGERPRINT_RET_MAX
? The documentation says it is set only in case of aHIT
, but the example chunking code ignores the result of the function, so I need to assume that offset is the same as if a chunk boundary had been found at the last byte of the buffer, that is, it points to the last byte of the buffer + 1. Is this correct?What does it mean for
rolling_hash2_run
to returnFINGERPRINT_RET_OTHER
? Why does the example does not consider this return?Do I need to call
rolling_hash2_reset
before everyrolling_hash2_run
call? Or can I use the following flow?Regards