Open nbarrientos opened 5 years ago
Thanks a lot for reporting the issue @nbarrientos . The memory behavior is to some extent interpretable, as 17000 * 17000 * 4.0 / 1024 / 1024 / 1024 = 1.07GB
. A note should be added to readme that the lib is not situable for computing arrays larger than 10000 elements.
As far as I know, in genetics there are a lot of clever algorithms to avoid such explosion problems. But I am unaware of simple algorithms that do not resort to external storage.
Hi @liufengyun!
Thanks for replying.
Sorry, when I read the documentation of the library I did see indeed that the complexity of the algorithm was O(n2), but I naively assumed that it was the time complexity. I should have read further about the space complexity :smile:
For the fun of it, I allowed a problem of input size 10000 (diffing two arrays of 10000 1-byte elements) to consume as much memory as it needed until it finished. Here's the memory footprint of the process.
Same plot but increasing the size to 15000:
I agree that perhaps adding a note clarifying this is a good idea, especially taking into account that LCS is enabled by default.
Bonne journée! :wave:
@nbarrientos Cool benchmarks 👍 Merci beaucoup et bonne journée!
Hi,
When debugging an issue with octocatalog-diff we narrowed down the problem to HashDiff unable to compare a couple of arrays when LCS is used. Here's a reproducer:
When comparing those two arrays using LCS (which is on by default) the application starts to eat up memory very quickly with 100% CPU usage until, in our case, the process is killed by the kernel's OOM killer as the test machine does not have any swap:
Yes, that's >1GiB!
I understand that LCS is O(n2) but not sure this is the expected behaviour as the array size is moderate.