1) test if load is possible or joblib.load needs to be used now
2) Prepare for possibility that files are also too big for joblib (-> OSO) and develop strategy for handling that (likely: create save and load functions that dump to/read from several files and implement everywhere pickle.dump()/load() is currently used)
Pickle.dump() sometimes fails due to memory errors (Temporary?) solution: use joblib
from sklearn.externals import joblib
joblib.dump(corpus, corpus_path)
See: https://stackoverflow.com/questions/17513036/pickle-dump-huge-file-without-memory-error
1) test if load is possible or joblib.load needs to be used now 2) Prepare for possibility that files are also too big for joblib (-> OSO) and develop strategy for handling that (likely: create save and load functions that dump to/read from several files and implement everywhere pickle.dump()/load() is currently used)