Minimize needless copies in Puma

axch commented 8 years ago

As Taylor noted in code review yesterday, Puma is full of accidental use of C++ copy constructors in places where passing objects (including asymptotically large objects like vectors and sets) by constant reference would do. Going through and changing all such places to be constant references should both reduce incidence of bugs involving unexpected (non-)mutation, and improve performance. No measurements have yet been done, but anecdotally the performance gains could be quite substantial.

Marking blocked on #429 because if I were to do this I would want to do a style cleanup first (80-character line limit!).

axch commented 8 years ago

Some data on how helpful this is likely to be. I profiled the command

venture puma -e
  "assume x = normal(0, 1); 
   observe normal(x, 1) = 2; 
   infer mh(default, one, 300000);"

and, if I am eyeballing the results correctly

Shared pointers seem to cost around 50% in aggregate
- Could actually be the copies, since many of those vectors will carry shared pointers to venture value objects.
Object creation (new, malloc) around 10%
Dynamic casting around 10-15%
Maintaining the scopes table is also surprisingly expensive (12%?)

Based on this, I estimate a 2x improvement on this workload, and corresponding amplification of the value obtained from fixing other hot spots.

axch commented 8 years ago

From conversation with @riastradh-probcomp:

Shared pointers may be more expensive than they look to the profiler, because they imply memory barriers and inter-core communication (even if the "application" is running single-threaded).
It is possible to break this change down into a mechanical global stage which makes judgement-requiring manipulations local. To wit:
- For each copied parameter T foo in each function signature, change the signature to const T & foo_ref, and add to the implementation T foo = foo_ref;. This conserves the semantics of the program, since it just makes the copy explicit; but now, eliminating those copies is a local operation within the body of each individual function.

riastradh-probcomp commented 8 years ago

Partial work to do this is on branch 20160527-riastradh-pumacopy.

Problem: C++ makes it difficult to usefully change the signature of a virtual member function with a default implementation in an abstract base classes, because the actual name of the function includes its type, so changing the type merely renames it, and any derived classes defining member functions with the same nominal name just stop overriding it.

For example, the PSP class has a virtual member function double logDensity(VentureValuePtr value, shared_ptr<Args> args) with a default implementation returning zero. Many derived classes of PSP override this logDensity function. If we change it to double logDensity(const VentureValuePtr &, const shared_ptr <Args> & args), then the derived classes no longer override it -- which would be hunky-dory if the compiler reported this as an error, but instead callers using the virtual member function of PSP now get the default implementation instead of the intended overridden one.

In contrast, the virtual member function VentureValuePtr simulate(shared_ptr<Args> args, gsl_rng * rng) of PSP has no default implementation, and is marked purely virtual with =0, so any derived class failing to override it is reported immediately by the compiler -- which makes it much easier to change the signature to VentureValuePtr simulate(const shared_ptr<Args> & args, gsl_rng * rng) and reliably update all the derived classes trying to override it.

(C++11 allows a member function of a derived class to be tagged override, so that the compiler rejects it if it doesn't actually override a member function of a base class. But we're not using C++11.)

The obvoius next step would be to split the default implementations of all these methods into separate base classes that can be used in combination with PSP to avoid default implementations in PSP itself. The same goes for various other abstract base classes: VentureValue, DB, Trace, &c.

axch commented 8 years ago

For the particular case of PSP, Lite has a better thought-out class hierarchy, with a handful of base classes with appropriate names that supply appropriate default implementations.

Is it possible to instead grep for all those override definitions and change them in tandem, one method of the hierarchy at a time?

riastradh-probcomp commented 8 years ago

I'm not confident in the outcome of grep -- I'm much more confident in getting the compiler to double-check my work at compile-time.

I looked at lite a little bit and found it not totally obvious how the base classes are organized and what behaviour the various options suppress -- I'll need some help to confidently identify the correct semantic partitioning.

For example, puma's PSP::canAbsorb has a default method that returns false, which presumably suppresses logDensity, logDensityOfData, incorporate, and unincorporate, which in turn sounds like what we call a 'likelihood-free (P)SP'. However, lite's PSP.canAbsorb is not implemented, rather than default false, and lite's LikelihoodFreePSP overrides only canAbsorb (to return false), not logDensity, logDensityOfData, incorporate, or unincorporate.

Here's the partitioning I think is intended, along with some names for what the base classes that could supply default virtual methods are called:

[ ] canAbsorb -- If false, called likelihood-free. If true, called ???.
- logDensity
- logDensityOfData
- incorporate
- unincorporate
[ ] childrenCanAAA -- If false, called ???. If true, called deterministic-maker-aaaugh?
- getAAALKernel
[ ] canEnumerateValues -- If false, called ???. If true, called ???.
- enumerateValues
[ ] hasDeltaKernel -- If false, called ???. If true, called ???.
- getDeltaKernel
[ ] isContinuous -- If false, called ???. If true, called ???.
- getSupportLowerBound
- getSupportUpperBound

axch commented 8 years ago

Not quite.

incorporate and unincorporate are always on, and do nothing by default. Else, they are expected to adjust the auxiliary state (to wit, every possible call to incorporate is a generator of an implicit Abelian group that acts on the auxiliary state, and calling unincorporate is the inverse). If these are nontrivial, the SP is called "[exchangeably] coupled" but, in the current system, they are always called.
canAbsorb enables logDensity. If false, called likelihood-free. No special name currently adopted for the true case, since most "simple" distributions can absorb. Note: at least in Lite, an SP may be able to absorb changes to some arguments but not others.
childrenCanAAA enables getAAALKernel, the default one of which turns around and calls logDensityOfData. If true, this is called an AAAMaker (which may be further specialized to a deterministic one if it is, indeed, deterministic -- the collapsed ones will tend to be, but the uncollapsed ones will not). False is the default state, so no special name currently.
Correct about canEnumerateValues and enumerateValues. The discrete SPs with finite support are the ones that turn this on, but that's not reflected in code nomenclature at present.
- Note: This is used by enumerative Gibbs only. There is a tempting analogue which would support quadrature Gibbs sampling for low-D continuous SPs.
Correct about hasDeltaKernel. No particular names. This code path is a vestige of not having proper inference programming. Not sure whether it's technically used, but even if it is, it's quite long in the tooth. However, there is no good replacement for it on the radar.
Correct about isContinuous. This is used only to check whether slice sampling is applicable, and the two enabled methods are used only for slice sampling. I am annoyed that the SP interface had to be adjusted to enable slice sampling, but, again, there is no apparent replacement.

riastradh-probcomp commented 8 years ago

The interim change I proposed requires each PSP to explicitly say 'no, I'm not a continuous PSP' or 'no, I don't have a delta kernel'. An alternative would be to use dynamic_cast and run-time type information (RTTI) with different base classes for each additional set of functionality a PSP might have. Something like this:

class PSP {
public:
    virtual VentureValuePtr simulate(...) const =0;
};

class AssessablePSP: public virtual PSP {
public:
    virtual double logDensity(...) const =0;
    virtual double logDensityOfData(...) const =0;
};

class MyPSP: public PSP, public AssessablePSP {
public:
    VentureValuePtr simulate(...) { ... }
    double logDensity(...) { ... }
    double logDensityOfData(...) { ... }
};

Usage:

PSP * x = ...;
AssessablePSP * x_assess;
if ((x_assess = dynamic_cast<AssessablePSP *>(x))) {
    .... x_assess->logDensity(...) ...
} else {
    // non-assessable, fall back to whatever else
}

This still requires organizing the extra sets of functionality into additional base classes, but it does not require changing every PSP to list yes-or-no for every extra set of functionality it might have.

axch commented 8 years ago

Looks reasonable.

riastradh-probcomp commented 8 years ago

Complication with the AssessablePSP proposal above: the decision of whether the PSP is assessable depends not on the PSP itself but also on some extra parameters to canAbsorb.

We could instead do something like this:

struct PSP {
    virtual Assessor * canAbsorb(trace, appNode, node);
    ...
};

PSP * x = ...;
Assessor * a = x->canAbsorb(trace, appNode, node);
if (a) {
    ... a->logDensity(...) ...
} else {
    // non-assessable, fall back to whatever else
}

(This API is also safer in that you can't accidentally call logDensity if you didn't already call canAbsorb.)

axch commented 8 years ago

I worry about this complicating the PSP-implementor-facing interface. Perhaps that can be alleviated by the convention of returning this from canAbsorb and having the PSP itself implement the Assessor interface in the assessable case.

riastradh-probcomp commented 8 years ago

Yes, that will certainly work -- that's even what I had in mind.

riastradh-probcomp commented 8 years ago

Bleh. Not every call to logDensity is immediately preceded by canAbsorb. E.g.,

https://github.com/probcomp/Venturecxx/blob/0aa99cac3726b8af66a7dc945fd8eccc09bb7d50/backend/new_cxx/src/detach.cxx#L84

probcomp / Venturecxx

Minimize needless copies in Puma #507