MemorryError in data.py

Hello,

first of all, congrats for the proyect! It has been really useful.

My problem is that when I try to execute data.py with long training text files (about 720.000KB for example), a MemoryError appers. However, I was able to train the model with training text about 200.000KB. Is there any kind of limit? Might I change some parameters? I've monitored my RAM usage and it's about 600MB when executing that script, so I am not sure what is the real problem. I have 16GB RAM

My output is:

Traceback (most recent call last): File "data.py", line 279, in <module> create_dev_test_train_split_and_vocabulary(path, True, TRAIN_FILE, DEV_FILE, TEST_FILE, PRETRAINED_EMBEDDINGS_PATH) File "data.py", line 228, in create_dev_test_train_split_and_vocabulary for line in text: File "E:\Python27\lib\codecs.py", line 699, in next return self.reader.next() File "E:\Python27\lib\codecs.py", line 630, in next line = self.readline() File "E:\Python27\lib\codecs.py", line 553, in readline line += data MemoryError

Thanks in advance!

ottokart / punctuator2

MemorryError in data.py #36