Memory usage in texts_to_matrix

sflinter commented 6 years ago

@saraob, thanks for providing this example model - very interesting.

I'm trying to recreate it locally, and am getting a MemoryError in the line: description_bow_train = tokenize.texts_to_matrix(description_train)

I've tried creating description_bow_train iteratively as follows:

# Convert to a bag of words vector
description_bow_train = []
i = 0
for row in description_train:
  m = tokenize.texts_to_matrix(row)
  description_bow_train.append(m)
  if(i % 100 == 0):
    print("Row %i" % i)
  i += 1

With this approach, memory (8GB allocated in a docker container) is exhausted after about 2,500 rows.

Do you have any sense as to how much memory this data set should consume to load? 8GB for 2,500 rows seems to be excessive to me...

sararob commented 6 years ago

Running it with this memory profiler extension, looks like texts_to_matrix uses ~2.5 GB (there's a lot of text in the dataset):

%load_ext memory_profiler
%memit description_bow_train = tokenize.texts_to_matrix(description_train)

Are you able to run it in colab successfully?

sflinter commented 6 years ago

@saraob, I rebooted everything (machine and docker instance) and it worked after that. Memory usage is still closer to 8GB, but at least it's not failing.

Thanks for taking a look.

sararob / keras-wine-model

Memory usage in texts_to_matrix #1