probcomp / Venturecxx

Primary implementation of the Venture probabilistic programming system
http://probcomp.csail.mit.edu/venture/
GNU General Public License v3.0
28 stars 6 forks source link

Minimize needless copies in Puma #507

Open axch opened 8 years ago

axch commented 8 years ago

As Taylor noted in code review yesterday, Puma is full of accidental use of C++ copy constructors in places where passing objects (including asymptotically large objects like vectors and sets) by constant reference would do. Going through and changing all such places to be constant references should both reduce incidence of bugs involving unexpected (non-)mutation, and improve performance. No measurements have yet been done, but anecdotally the performance gains could be quite substantial.

Marking blocked on #429 because if I were to do this I would want to do a style cleanup first (80-character line limit!).

axch commented 8 years ago

Some data on how helpful this is likely to be. I profiled the command

venture puma -e
  "assume x = normal(0, 1); 
   observe normal(x, 1) = 2; 
   infer mh(default, one, 300000);"

and, if I am eyeballing the results correctly

Based on this, I estimate a 2x improvement on this workload, and corresponding amplification of the value obtained from fixing other hot spots.

axch commented 8 years ago

From conversation with @riastradh-probcomp:

riastradh-probcomp commented 8 years ago

Partial work to do this is on branch 20160527-riastradh-pumacopy.

Problem: C++ makes it difficult to usefully change the signature of a virtual member function with a default implementation in an abstract base classes, because the actual name of the function includes its type, so changing the type merely renames it, and any derived classes defining member functions with the same nominal name just stop overriding it.

For example, the PSP class has a virtual member function double logDensity(VentureValuePtr value, shared_ptr<Args> args) with a default implementation returning zero. Many derived classes of PSP override this logDensity function. If we change it to double logDensity(const VentureValuePtr &, const shared_ptr <Args> & args), then the derived classes no longer override it -- which would be hunky-dory if the compiler reported this as an error, but instead callers using the virtual member function of PSP now get the default implementation instead of the intended overridden one.

In contrast, the virtual member function VentureValuePtr simulate(shared_ptr<Args> args, gsl_rng * rng) of PSP has no default implementation, and is marked purely virtual with =0, so any derived class failing to override it is reported immediately by the compiler -- which makes it much easier to change the signature to VentureValuePtr simulate(const shared_ptr<Args> & args, gsl_rng * rng) and reliably update all the derived classes trying to override it.

(C++11 allows a member function of a derived class to be tagged override, so that the compiler rejects it if it doesn't actually override a member function of a base class. But we're not using C++11.)

The obvoius next step would be to split the default implementations of all these methods into separate base classes that can be used in combination with PSP to avoid default implementations in PSP itself. The same goes for various other abstract base classes: VentureValue, DB, Trace, &c.

axch commented 8 years ago

For the particular case of PSP, Lite has a better thought-out class hierarchy, with a handful of base classes with appropriate names that supply appropriate default implementations.

Is it possible to instead grep for all those override definitions and change them in tandem, one method of the hierarchy at a time?

riastradh-probcomp commented 8 years ago

I'm not confident in the outcome of grep -- I'm much more confident in getting the compiler to double-check my work at compile-time.

I looked at lite a little bit and found it not totally obvious how the base classes are organized and what behaviour the various options suppress -- I'll need some help to confidently identify the correct semantic partitioning.

For example, puma's PSP::canAbsorb has a default method that returns false, which presumably suppresses logDensity, logDensityOfData, incorporate, and unincorporate, which in turn sounds like what we call a 'likelihood-free (P)SP'. However, lite's PSP.canAbsorb is not implemented, rather than default false, and lite's LikelihoodFreePSP overrides only canAbsorb (to return false), not logDensity, logDensityOfData, incorporate, or unincorporate.

Here's the partitioning I think is intended, along with some names for what the base classes that could supply default virtual methods are called:

axch commented 8 years ago

Not quite.

riastradh-probcomp commented 8 years ago

The interim change I proposed requires each PSP to explicitly say 'no, I'm not a continuous PSP' or 'no, I don't have a delta kernel'. An alternative would be to use dynamic_cast and run-time type information (RTTI) with different base classes for each additional set of functionality a PSP might have. Something like this:

class PSP {
public:
    virtual VentureValuePtr simulate(...) const =0;
};

class AssessablePSP: public virtual PSP {
public:
    virtual double logDensity(...) const =0;
    virtual double logDensityOfData(...) const =0;
};

class MyPSP: public PSP, public AssessablePSP {
public:
    VentureValuePtr simulate(...) { ... }
    double logDensity(...) { ... }
    double logDensityOfData(...) { ... }
};

Usage:

PSP * x = ...;
AssessablePSP * x_assess;
if ((x_assess = dynamic_cast<AssessablePSP *>(x))) {
    .... x_assess->logDensity(...) ...
} else {
    // non-assessable, fall back to whatever else
}

This still requires organizing the extra sets of functionality into additional base classes, but it does not require changing every PSP to list yes-or-no for every extra set of functionality it might have.

axch commented 8 years ago

Looks reasonable.

riastradh-probcomp commented 8 years ago

Complication with the AssessablePSP proposal above: the decision of whether the PSP is assessable depends not on the PSP itself but also on some extra parameters to canAbsorb.

We could instead do something like this:

struct PSP {
    virtual Assessor * canAbsorb(trace, appNode, node);
    ...
};

PSP * x = ...;
Assessor * a = x->canAbsorb(trace, appNode, node);
if (a) {
    ... a->logDensity(...) ...
} else {
    // non-assessable, fall back to whatever else
}

(This API is also safer in that you can't accidentally call logDensity if you didn't already call canAbsorb.)

axch commented 8 years ago

I worry about this complicating the PSP-implementor-facing interface. Perhaps that can be alleviated by the convention of returning this from canAbsorb and having the PSP itself implement the Assessor interface in the assessable case.

riastradh-probcomp commented 8 years ago

Yes, that will certainly work -- that's even what I had in mind.

riastradh-probcomp commented 8 years ago

Bleh. Not every call to logDensity is immediately preceded by canAbsorb. E.g.,

https://github.com/probcomp/Venturecxx/blob/0aa99cac3726b8af66a7dc945fd8eccc09bb7d50/backend/new_cxx/src/detach.cxx#L84