wesleybeckner / gains

project that enables molecular design and computational screening of small molecules
MIT License
6 stars 2 forks source link

Avoid using pickle #2

Closed dacb closed 6 years ago

dacb commented 6 years ago

Pickling objects in Python is a potentially problematic solution for serializing objects. Issues that have identified with pickling include:

An alternative option may include HDFS.

wesleybeckner commented 6 years ago

Great point. I was having some issues with pickle. I'm in the process of converting everything over to dill. My understanding is that dill, unlike pickle, saves class/methods information. I'll look into HDFS as well

dacb commented 6 years ago

I wasn't aware of dill, thanks! It doesn't solve the malicious code problem, but it seems to ease sharing across platforms. Cool!

The real problem with HDFS is that you still need to serialize your objects into the HDFS blobs yourself. I'll ask around for some libraries to do that part.

wesleybeckner commented 6 years ago

dangerous pickles! https://intoli.com/blog/dangerous-pickles/

Keras has its own serialization to HDF5: https://keras.io/getting-started/faq/#how-can-i-save-a-keras-model

and h5py could be used to serialize other objects into HDF5 as well: http://docs.h5py.org/en/latest/index.html

Since these are not very large datafiles HDF5 may be sufficient

wesleybeckner commented 6 years ago

After some finicking with Travis CI, the keras switch is successful

dacb commented 6 years ago

AWESOME!