maxhodak / keras-molecules

Autoencoder network for learning a continuous representation of molecular structures.
MIT License
520 stars 146 forks source link

Chunk adding tensor to h5 file to prevent out_of_memory errors #19

Closed pechersky closed 7 years ago

pechersky commented 8 years ago

Mentioned in #18

pechersky commented 7 years ago

@maxhodak One thing I forgot to mention, this change also makes the preprocessed h5 file ~2X smaller, since this stores the tensor as a "float4" type instead of an "integer8" type, at least on my machine. Don't know if this is an undesired side effect.

maxhodak commented 7 years ago

The preprocessed file is a huge sparse binary matrix, so it seems like this shouldn't make a difference. There are definitely optimizations to do here but it hasn't been broken enough for me to bother with yet.

maxhodak commented 7 years ago

I don't think the float vs int encoding should make a difference here but I should probably investigate a little bit.