Closed hanskrupakar closed 8 years ago
can you try passing in -max_batch_l when you run train.lua?
The same error persists with the first line of the stack traceback alone different, when I have a -max_batch_l
of 32 :
stack traceback:
[C]: at 0x7faa6b4f2530
I should also mention, I have added a custom word2vec embedding in for both the encoder and the decoder side and the array index of those arrays are starting from 0 in the hdf5 file. But when I remove these files as input from the pre_word_vecs_enc
and pre_word_vecs_dec
, I am getting the same error with this alone different:
stack traceback:
[C]: at 0x7f0395f28530
Should I make changes here? This is the way I created the 2 hdf5 files:
vec = []
with open("demo.src.dict", 'r') as f:
for line in f.readlines():
t = np.array([modeleng[line.split(' ')[0].strip()]])
vec.append(t)
english=np.reshape(np.array(vec),(-1,embed_size))
# gives (50004,300) numpy array, embed_size = 300
# (I reduced it to run minimal example first)
vec=[]
with open("demo.targ.dict", 'r') as f:
for line in f.readlines():
t = np.array([modeltam[line.split(' ')[0].strip().decode('utf-8')]])
vec.append(t)
tamil = np.reshape(np.array(vec),(-1,embed_size))
# gives (50004,500) numpy array
if not os.path.isfile('src_wv_%d.hdf5'%(embed_size)):
with h5py.File('src_wv_%d.hdf5'%(embed_size), 'w') as hf:
hf.create_dataset('word_vecs', data=english)
if not os.path.isfile('targ_wv_%d.hdf5' %(embed_size)):
with h5py.File('targ_wv_%d.hdf5'%(embed_size), 'w') as hf:
hf.create_dataset('word_vecs', data=tamil)
hmm, did you use --batchsize 32 when running preprocess.py?
Okay I forgot to change it in preprocess.py
as well. I changed it there as well and it works. Silly mistake. Thanks so much. I will close the issue.
Can you explain the reason behind the error, why setting the batch size fixed it?
I also wanted to ask if there is any way that saving the model to file can be done in units smaller than 1 epoch (preferably in terms of the batches)? My running is slow, takes about an hour for an epoch and I want to save to file once for every 50 batches.
@hanskrupakar - we are woking on the intermediate saving - it will be based on a time period (every N minutes) and with the corresponding key-turn "restart" so that runs can be restarted easily from any checkpoint. We will be pushing the feature soon!
I followed all the steps specified in your
README.md
to try to implement a baseline RNN Attention Encoder Decoder model by Luong et al. 2015. I had no problems until I reached the actual training part. When I give thetrain.lua
script, I get this error. How do I fix it to run the model as it should?I have Nvidia GeForce 650M 2 GB GPU with 384 cores. I have cuda 7.5 and cudnn 4. Please help