Closed comja closed 2 years ago
I think SimpleLossCompute should work fine on CPU if you have enough memory. Do you get an error?
There is a variant you could use, where you split into chunks like MultiGPULossCompute, but do not use data parallel. Let me know if SimpleLossCompute fails
it seems like MultiGPULossCompute
does a good job separating data into chunks. SimpleLossCompute
fails, I think because there is no enough RAM.
But adapted MultiGPULossCompute
fails because of something else. The log is following.
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-38-240a851d50f5> in <module>()
9 MultiGPULossCompute(model.generator, criterion,
10 # devices=devices,
---> 11 opt=model_opt))
12 model.eval()
13 loss = run_epoch((rebatch(pad_idx, b) for b in valid_iter),
<ipython-input-26-3c250c9d9ec4> in run_epoch(data_iter, model, loss_compute)
8 out = model.forward(batch.src, batch.trg,
9 batch.src_mask, batch.trg_mask)
---> 10 loss = loss_compute(out, batch.trg_y, batch.ntokens)
11 total_loss += loss
12 total_tokens += batch.ntokens
<ipython-input-36-46d56c947589> in __call__(self, out, targets, normalize)
37 for o in out]
38
---> 39 gen = generator(out_column)
40 # gen = nn.parallel.parallel_apply(generator, out_column)
41
~/miniconda2/envs/py36/lib/python3.6/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
355 result = self._slow_forward(*input, **kwargs)
356 else:
--> 357 result = self.forward(*input, **kwargs)
358 for hook in self._forward_hooks.values():
359 hook_result = hook(self, input, result)
<ipython-input-4-96c707961385> in forward(self, x)
6
7 def forward(self, x):
----> 8 return F.log_softmax(self.proj(x), dim=-1)
~/miniconda2/envs/py36/lib/python3.6/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
355 result = self._slow_forward(*input, **kwargs)
356 else:
--> 357 result = self.forward(*input, **kwargs)
358 for hook in self._forward_hooks.values():
359 hook_result = hook(self, input, result)
~/miniconda2/envs/py36/lib/python3.6/site-packages/torch/nn/modules/linear.py in forward(self, input)
53
54 def forward(self, input):
---> 55 return F.linear(input, self.weight, self.bias)
56
57 def __repr__(self):
~/miniconda2/envs/py36/lib/python3.6/site-packages/torch/nn/functional.py in linear(input, weight, bias)
831 - Output: :math:`(N, *, out\_features)`
832 """
--> 833 if input.dim() == 2 and bias is not None:
834 # fused op is marginally faster
835 return torch.addmm(bias, input, weight.t())
AttributeError: 'list' object has no attribute 'dim'
Did you figure this out? I would like to leave it open.
Unfortunately SimpleLossCompute
still does not work (and this is not because of RAM).
It fails on the validation step with the following error message
loss.backward()
File "/home/melpuser/miniconda2/envs/py36/lib/python3.6/site-packages/torch/autograd/variable.py", line 167, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, retain_variables)
File "/home/melpuser/miniconda2/envs/py36/lib/python3.6/site-packages/torch/autograd/__init__.py", line 99, in backward
variables, grad_variables, retain_graph)
RuntimeError: element 0 of variables tuple is volatile
For some reason it doesn't like calculating loss.backward()
in SimpleLossCompute
while calculating loss on the validation set.
I've tried to adapt MultiGPULossCompute
to simple CPU version, but so far with no success, cannot deal with x and y - TypeError: forward() missing 1 required positional argument: 'target'
@comja @srush
I tackled the same problem and found the following codes worked.
pip install http://download.pytorch.org/whl/cpu/torch-0.3.1-cp36-cp36m-linux_x86_64.whl
After ## Multi-GPU Training
of the notebook,
pad_idx = TGT.vocab.stoi["<blank>"]
model = make_model(len(SRC.vocab), len(TGT.vocab), N=6)
criterion = LabelSmoothing(size=len(TGT.vocab), padding_idx=pad_idx, smoothing=0.1)
BATCH_SIZE = 100
train_iter = MyIterator(train, batch_size=BATCH_SIZE,
repeat=False, sort_key=lambda x: (len(x.src), len(x.trg)),
batch_size_fn=batch_size_fn, train=True)
valid_iter = MyIterator(val, batch_size=BATCH_SIZE,
repeat=False, sort_key=lambda x: (len(x.src), len(x.trg)),
batch_size_fn=batch_size_fn, train=False)
(I changed BATCH_SIZE for my environment.)
model_opt = NoamOpt(model.src_embed[0].d_model, 1, 2000,
torch.optim.Adam(model.parameters(), lr=0, betas=(0.9, 0.98), eps=1e-9))
for epoch in range(10):
model.train()
run_epoch((rebatch(pad_idx, b) for b in train_iter), model,
SimpleLossCompute(model.generator, criterion, model_opt))
model.eval()
print(run_epoch((rebatch(pad_idx, b) for b in train_iter), model,
SimpleLossCompute(model.generator, criterion, None)))
NOTE: I just checked the script doesn't return errors, so I'm not sure whether the training goes well or not (I mean I didn't check the performance of a trained model).
Hello!
Thank you very much for your contribution.
I wonder how to adapt the code in order to train a model on IWSLT data on my PC without GPUs.
It seems like
MultiGPULossCompute
should be replaced inrun_epoch
, butSimpleLossCompute
doesn't seem like an appropriate candidate.I would appreciate any hint.