Closed theis188 closed 7 years ago
I discovered that the error came from the changes I had made to data.py. The model could not load any examples and therefore could not train.
I reverted to the original version and converted the input files and the model is now training as expected.
This is about the textsum model.
I ran the model in train mode with about 80k articles (vocabulary ~40k) but after about a week still no training loss had been reported and no files have been written to the train directory. It appears that no training took place. A couple of notes:
Is my system just too slow? Too little memory to train this much? Thoughts?
I am running in decode mode right now to see if anything pops out somehow.
PS: I have run training + decoding 'successfully' with about 1k articles in the training set using similar setup (though on a different machine).
PS Edit: I've been having some permission issues on the machine. Is it possible a permission issue prevented the train folder from being written to?
PS PS Edit: In a possibly related issue, I am seeing this error: pthread_cond_wait: Resource busy