Closed sflinter closed 6 years ago
Running it with this memory profiler extension, looks like texts_to_matrix
uses ~2.5 GB (there's a lot of text in the dataset):
%load_ext memory_profiler
%memit description_bow_train = tokenize.texts_to_matrix(description_train)
Are you able to run it in colab successfully?
@saraob, I rebooted everything (machine and docker instance) and it worked after that. Memory usage is still closer to 8GB, but at least it's not failing.
Thanks for taking a look.
@saraob, thanks for providing this example model - very interesting.
I'm trying to recreate it locally, and am getting a
MemoryError
in the line:description_bow_train = tokenize.texts_to_matrix(description_train)
I've tried creating
description_bow_train
iteratively as follows:With this approach, memory (8GB allocated in a docker container) is exhausted after about 2,500 rows.
Do you have any sense as to how much memory this data set should consume to load? 8GB for 2,500 rows seems to be excessive to me...