probcomp / cgpm

Library of composable generative population models which serve as the modeling and inference backend of BayesDB.
Apache License 2.0
25 stars 11 forks source link

vscgpm: hooked instance of VsCGpm is incompatible with Engine(multiprocess=True) #215

Open fsaad opened 7 years ago

fsaad commented 7 years ago

engine.compose_cgpm([vscgpm], multiprocess=True) fails with

E           RuntimeError: Subprocess failed: Traceback (most recent call last):
E             File "/scratch/fsaad/cgpm/cgpm/utils/parallel_map.py", line 55, in process_input
E               outq_wr.send((i, ok, fx))
E           PicklingError: Can't pickle <type 'function'>: attribute lookup __builtin__.function failed
fsaad commented 7 years ago

Further investigation reveals that the Venturecxx ripl is not picklable. The issue is that parallel_map will attempt to pickle objects when communicating between master-slave processes.

This issue does not arise with sklearn-based cgpms such RandomForest and LinearRegression since those predictor objects from sklearn are pickleable.

One possibility is to explicitly convert the hooked cgpm to its JSON metadata format and have the worker press deserialize the cgpm, except the performance hit of explicit deserialization and repopulating the trace could be significant. Consider profiling this approach.

Alternative is to patch the venture.ripl.ripl.Ripl class to be pickleable.