Diagnose 15% slowdown on ViterbiThree.hs

ndaniels / MRFy

Structural Motifs Using Random Fields, stochastic search implementation in Haskell

12 stars 1 forks source link

Diagnose 15% slowdown on ViterbiThree.hs #3

Closed nrnrnr closed 11 years ago

nrnrnr commented 11 years ago

The first thing to do is probably to revert to the identical array type used for the original Viterbi.hs. Just to eliminate a possibly gratuitous source of variation.

BurntSushi commented 11 years ago

So it looks like it was actually a 30% slow down. I must have mixed up my numbers earlier.

I tried switching back to the original array type: no change. I then tried computing preceders with pattern matching instead of array lookup, and it dropped down to a 17.5% slowdown. Perhaps it wasn't being memoized?

I've started an experiments log that has more details on the above: https://github.com/ndaniels/MRFy-ICFP/blob/master/EXPERIMENTS.

BurntSushi commented 11 years ago

So I guess the next stop is the profiler. Joy.

nrnrnr commented 11 years ago

So I guess the next stop is the profiler. Joy.

Try valgrind first.

BurntSushi commented 11 years ago

So I already ran a profile. The biggest difference I can see (where preceders is computed with pattern matching) is in the number of allocations. The profile for the revised Viterbi claims

total alloc = 10,389,907,248 bytes (excludes profiling overheads)

while the profile for the ICFP viterbi has

total alloc = 7,098,475,376 bytes (excludes profiling overheads)

BurntSushi commented 11 years ago

I'm having trouble gleaning anything else useful from the profile.

nrnrnr commented 11 years ago

That's good actually. Is there a differencer that will identify the cost center of the allocations?

nrnrnr commented 11 years ago

space leak in scoreOnly?

nrnrnr commented 11 years ago

OK, here are the big allocation centers. I don't know where the anonymous lambda is:

     scoreOnly                    ViterbiThree                       343           0   70.4   80.8    99.3   98.1   8402 8390467280
      transition                  ViterbiThree                       474    13077246    5.6    5.0     9.3    5.0    669 519805048
      /!/                         Constants                          522    10172718    2.5    3.1     2.5    3.1    301 325526976
       /!/                        Constants                          495     9050521    4.5    2.8     4.5    2.8    534 289616672
      bitransition                ViterbiThree                       513    10172718    1.6    2.3     2.4    2.3    192 244018080
      emission                    ViterbiThree                       494    12666675    2.6    2.1     8.3    4.9    311 217212504
      scoreOnly.\                 ViterbiThree                       536    23249964    2.6    2.0     2.8    2.0    310 202666800

nrnrnr commented 11 years ago

OK, so I don't know why the transition function is doing a huge amount of allocation, but it was one of our candidates for elimination. So that's comforting. The questino is not why is that slow but why is the other one fast?

nrnrnr commented 11 years ago

The corresponding "old" function is transScoreNode. It does almost no allocation. Why not?

nrnrnr commented 11 years ago

OK, observation: in the old code, whenever transScoreNode is called it is an argument to Scored, which says it is strict.

nrnrnr commented 11 years ago

Whereas the child function passed to hoViterbi is not strict. So there's a huge memory leak right there.

BurntSushi commented 11 years ago

The corresponding "old" function is transScoreNode. It does almost no allocation. Why not?

Hmm. According to the profile, it looks like it's doing plenty of allocation?

transScoreNode Viterbi 7.3 4.3 866 304980360

Which is comparable with

transition ViterbiThree 5.6 5.0 669 519805048

nrnrnr commented 11 years ago

The ratio I measure is over 4 to 1:

> =116148032/519805048
0.22344537138854

nrnrnr commented 11 years ago

Andrew has solved this, we believe. See #9