opencobra / cobrapy

COBRApy is a package for constraint-based modeling of metabolic networks.
http://opencobra.github.io/cobrapy/
GNU General Public License v2.0
461 stars 216 forks source link

Speed up serialization #124

Closed phantomas1234 closed 9 years ago

phantomas1234 commented 10 years ago

Why is

pickle.loads(pickle.dumps(iJO))  # (1 loops, best of 3: 1.88 s per loop)

is slower than

json_model = from_json(to_json(iJO))  # (1 loops, best of 3: 380 ms per loop)

Could one define __getstate__ and __setstate__ in cobra.core.Model.Model to use to_json and from_json? Speeding up serialization will help a lot with parallelizing cobra calculations.

aebrahim commented 10 years ago

Hey Niko,

To answer your first question, the main reason that json searialization is faster than that of pickle is that there is a defined format for the json serialization, with specific fields only enumerated, allowing us to speed up the process. The functions to_json and from_json can't handle the flexibility to serialize dynamic objects which have flexible attributes.

A lot of parallel libraries rely on pickle specifically, so you are correct in that this will impede parallel cobra calculations.

The only solution I can think of is to "force" the loss of flexibility by using slots for all cobra objects as a static-typing gimmick, but my impression is that this is considered un-pythonic. I could imagine it creating issues for subclassing as well. However, slots does lower memory use when creating large numbers of objects (as cobrapy does), which might alone be enough reason to do it.

If you have any other ideas on good ways to do this in Python I'd love to hear them.

On Mon, Sep 8, 2014 at 2:49 AM, Nikolaus Sonnenschein < notifications@github.com> wrote:

Why is

pickle.loads(pickle.dumps(iJO)) # (1 loops, best of 3: 1.88 s per loop)

is slower than

json_model = from_json(to_json(iJO)) # (1 loops, best of 3: 380 ms per loop)

Could one define getstate and setstate in cobra.core.Model.Model to use to_json and from_json? Speeding up serialization will help a lot with parallelizing cobra calculations.

— Reply to this email directly or view it on GitHub https://github.com/opencobra/cobrapy/issues/124.