jprothero / MCTSnet

Implementation of MCTSnet
7 stars 4 forks source link

Memory loading error #2

Open yhyu13 opened 6 years ago

yhyu13 commented 6 years ago

Hi, @jprothero

I was testing your newest tweaking but I found an error about cpu vs gpu:

python main.py 

/home/hangyu5/anaconda2/envs/mctsnet/lib/python3.6/site-packages/torch/serialization.py:367: SourceChangeWarning: source code of class 'MCTSnet_model.MCTSnet' has changed. you can retrieve the original source code by accessing the object's source attribute or set `torch.nn.Module.dump_patches = True` and use the patch tool to revert the changes.
  warnings.warn(msg, SourceChangeWarning)
Loaded best model
Loaded best model
Loading memories...
Number of memories: 169
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:23<00:00,  2.37s/it]
Saving memories...
HBox(children=(IntProgress(value=0, description='Epoch', max=1), HTML(value='')))
  0%|                                                                                                                                                                                | 0/10 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "main.py", line 32, in <module>
    trainer.fastai_train(mctsnet.new, memories)
  File "/home/hangyu5/Desktop/Deep_Learning/MCTSnet/MCTSnet_trainer.py", line 80, in fastai_train
    learner.fit(5e-3, epochs, wds=1e-7) #was 7e-2
  File "/home/hangyu5/anaconda2/envs/mctsnet/lib/python3.6/site-packages/fastai/learner.py", line 99, in fit
    self.fit_gen(self.model, self.data, layer_opt, n_cycle, **kwargs)
  File "/home/hangyu5/anaconda2/envs/mctsnet/lib/python3.6/site-packages/fastai/learner.py", line 89, in fit_gen
    metrics=metrics, callbacks=callbacks, reg_fn=self.reg_fn, clip=self.clip, **kwargs)
  File "/home/hangyu5/anaconda2/envs/mctsnet/lib/python3.6/site-packages/fastai/model.py", line 84, in fit
    loss = stepper.step(V(x),V(y))
  File "/home/hangyu5/anaconda2/envs/mctsnet/lib/python3.6/site-packages/fastai/model.py", line 43, in step
    loss = raw_loss = self.crit(output, y)
  File "/home/hangyu5/Desktop/Deep_Learning/MCTSnet/MCTSnet_trainer.py", line 24, in train_wrapper
    return self.train(self.net, self.memories)
  File "/home/hangyu5/Desktop/Deep_Learning/MCTSnet/MCTSnet_trainer.py", line 40, in train
    states = torch.cat(states, dim=0)
RuntimeError: Expected a Tensor of type torch.cuda.FloatTensor but found a type torch.FloatTensor for sequence element 2  in sequence argument at position #1 'tensors'

The main error is

RuntimeError: Expected a Tensor of type torch.cuda.FloatTensor but found a type torch.FloatTensor for sequence element 2  in sequence argument at position #1 'tensors'

I guess the model you committed was trained on cpu alone, right? And when I load the model onto gpu, it expect different type of variable? Do you know any solution to fix this kind of issue in pytorch? (Delete your model and memory.* solve this problem, but I am still confused about cpu vs gpu in pytorch a bit)

jprothero commented 6 years ago

@yhyu13

I believe it is because the saved model or memories were done using CPU or GPU and you tried to load them with the other type. Deleting them should fix it like you said. Probably that would be a good reason to .gitignore the checkpoints folder, I'll try to add that at some point.