Closed andreasabel closed 4 years ago
I have a simple fix here: https://github.com/skogsbaer/HTF/pull/87
The fix just restricts the length of the input strings.
With this, I get:
./Diff 1000
Total execution time: 5490ms
10,814,412,072 bytes allocated in the heap
7,258,721,136 bytes copied during GC
707,046,280 bytes maximum residency (15 sample(s))
10,879,464 bytes maximum slop
674 MB total memory in use (0 MB lost due to fragmentation)
Tot time (elapsed) Avg pause Max pause
Gen 0 10342 colls, 0 par 2.177s 2.189s 0.0002s 0.0018s
Gen 1 15 colls, 0 par 1.819s 2.481s 0.1654s 0.5047s
INIT time 0.000s ( 0.002s elapsed)
MUT time 1.471s ( 1.713s elapsed)
GC time 3.996s ( 4.670s elapsed)
EXIT time 0.000s ( 0.005s elapsed)
Total time 5.468s ( 6.390s elapsed)
%GC time 0.0% (0.0% elapsed)
Alloc rate 7,350,942,162 bytes per MUT second
Productivity 26.9% of total user, 26.8% of total elapsed
Still, not brilliant, but that should suffice in practice.
After adding longer testcase to the testsuite for BNFC I was surprised to see the testsuite hanging. I then discovered that 13GB had been allocated (as much as my machine can offer). I traced the problem down to HTF and further to the diff algorithm used here, Data.Algorithm.Diff. It does not seem to scale to long strings with quite some differences.
However, when a test fails, I would be happy to see only the first couple of differences in long strings; if there are many differences, something fundamental must be wrong anyway.
I wonder whether
assertEqual
or the test runner could be configured to limit the number of differences shown and speed up the process.However, I also have doubts in the quality of the library in use Data.Algorithm.Diff. It seems it uses
reverse
, making it impossible to only get the first n differences without computing all:To get a feel for the memory consumption of this diff algorithm, here is a protocol:
That is already 4GB for strings of length 6M, and the memory consumption is linear!
I attach my benchmark program if you want to experiment further: