harvardnlp / sa-vae

154 stars 15 forks source link

About implementation #4

Open dongqian0206 opened 5 years ago

dongqian0206 commented 5 years ago

Hi. I am trying to run the source code based on Pytorch 0.4.1. I encountered a problem:

cudnn RNN backward can only be called in training mode

Did you encounter this problem? Pytorch version issue?

yoonkim commented 5 years ago

Hi sorry I just saw this. I don't remember getting this issue in Pytorch 0.2. Which line in the code is triggerring this?

tombosc commented 5 years ago

@yoonkim I get that error as well under pytorch 1.0, here's the full trace:

Traceback (most recent call last):                                                                                                                                                                                                     
  File "train_text_cyc_ptb.py", line 400, in <module>                                                                                                                                                                                  
    main(args)                                                                                                                                                                                                                         
  File "train_text_cyc_ptb.py", line 277, in main                                                                                                                                                                                      
    val_nll = eval(val_data, model, meta_optimizer)                                                                                                                                                                                    
  File "train_text_cyc_ptb.py", line 355, in eval                                                                                                                                                                                      
    var_params_svi = meta_optimizer.forward([mean_svi, logvar_svi], sents)                                                                                                                                                             
  File "/network/tmp1/bosctom/cyclical_annealing/language_model/optim_n2n.py", line 43, in forward                                                                                                                                     
    return self.forward_mom(input, y, verbose)                                                                                                                                                                                           File "/network/tmp1/bosctom/cyclical_annealing/language_model/optim_n2n.py", line 90, in forward_mom                                                                                                                                 
    all_grads_k = torch.autograd.grad(loss, all_input_params, retain_graph = True)                                                                                                                                                     
  File "/network/home/bosctom/anaconda3/envs/hrnncu9.0/lib/python3.6/site-packages/torch/autograd/__init__.py", line 145, in grad                                                                                                          inputs, allow_unused)                                                                                                                                                                                                              RuntimeError: cudnn RNN backward can only be called in training mode

Would be great if you could help port the code to more recent pytorch! I tried to install older pytorch versions but it is complicated because of dependencies.

(As you see, I'm running the code from another repo but I think it's the same code)

Thanks :-)

leehaoyuan commented 4 years ago

torch.autograd.grad cannot work if there are rnn module in eval mode. Considering only nn.Dropout works differently in eval mode, I manually set dropout module of the model to eval mode in function eval(). I guess it will get same result. from model.eval() to model.dropout.train(False) model.dec_linear.train(False)