nsaef / text_exploration

Tool for analyzing big unstructred collections of digital text documents. Master thesis in Digital Humanities.
3 stars 0 forks source link

Files too big for pickle #76

Closed nsaef closed 6 years ago

nsaef commented 6 years ago

Pickle.dump() sometimes fails due to memory errors (Temporary?) solution: use joblib

from sklearn.externals import joblib joblib.dump(corpus, corpus_path)

See: https://stackoverflow.com/questions/17513036/pickle-dump-huge-file-without-memory-error

1) test if load is possible or joblib.load needs to be used now 2) Prepare for possibility that files are also too big for joblib (-> OSO) and develop strategy for handling that (likely: create save and load functions that dump to/read from several files and implement everywhere pickle.dump()/load() is currently used)