Closed nrnrnr closed 11 years ago
So it looks like it was actually a 30% slow down. I must have mixed up my numbers earlier.
I tried switching back to the original array type: no change. I then tried computing preceders
with pattern matching instead of array lookup, and it dropped down to a 17.5% slowdown. Perhaps it wasn't being memoized?
I've started an experiments log that has more details on the above: https://github.com/ndaniels/MRFy-ICFP/blob/master/EXPERIMENTS.
So I guess the next stop is the profiler. Joy.
So I guess the next stop is the profiler. Joy.
Try valgrind first.
N
So I already ran a profile. The biggest difference I can see (where preceders
is computed with pattern matching) is in the number of allocations. The profile for the revised Viterbi claims
total alloc = 10,389,907,248 bytes (excludes profiling overheads)
while the profile for the ICFP viterbi has
total alloc = 7,098,475,376 bytes (excludes profiling overheads)
I'm having trouble gleaning anything else useful from the profile.
That's good actually. Is there a differencer that will identify the cost center of the allocations?
space leak in scoreOnly
?
OK, here are the big allocation centers. I don't know where the anonymous lambda is:
scoreOnly ViterbiThree 343 0 70.4 80.8 99.3 98.1 8402 8390467280
transition ViterbiThree 474 13077246 5.6 5.0 9.3 5.0 669 519805048
/!/ Constants 522 10172718 2.5 3.1 2.5 3.1 301 325526976
/!/ Constants 495 9050521 4.5 2.8 4.5 2.8 534 289616672
bitransition ViterbiThree 513 10172718 1.6 2.3 2.4 2.3 192 244018080
emission ViterbiThree 494 12666675 2.6 2.1 8.3 4.9 311 217212504
scoreOnly.\ ViterbiThree 536 23249964 2.6 2.0 2.8 2.0 310 202666800
OK, so I don't know why the transition
function is doing a huge amount of allocation, but it was one of our candidates for elimination. So that's comforting. The questino is not why is that slow but why is the other one fast?
The corresponding "old" function is transScoreNode
. It does almost no allocation. Why not?
OK, observation: in the old code, whenever transScoreNode
is called it is an argument to Scored
, which says it is strict.
Whereas the child
function passed to hoViterbi
is not strict. So there's a huge memory leak right there.
The corresponding "old" function is transScoreNode. It does almost no allocation. Why not?
Hmm. According to the profile, it looks like it's doing plenty of allocation?
transScoreNode Viterbi 7.3 4.3 866 304980360
Which is comparable with
transition ViterbiThree 5.6 5.0 669 519805048
The ratio I measure is over 4 to 1:
> =116148032/519805048
0.22344537138854
Andrew has solved this, we believe. See #9
The first thing to do is probably to revert to the identical array type used for the original
Viterbi.hs
. Just to eliminate a possibly gratuitous source of variation.