Closed ibivibiv closed 3 years ago
This seems to happen on the unlikelihood, Dodeca, and Wizard models for sure.
I'll check this bug ! thanks.
I wonder if it isn't a parlai version change? I also caught some bugs on image_features_dim and image_encoder_num_layers not being set in the opts file and args. This makes me think since your setupl.py doesn't pin a version on torch and parlai that maybe the version shifted? Do you know what releases you were on when you originated the code?
Mine is 1.1.0. I'll set specific version.
I found it, it is in the config files that are downloaded from Parlai. You have to override gpu to set it to your gpu number. It defaults to -1 and thereby the model gets assigned wrong. I think if we fish the keyword args through then in the constructor it can set the gpu, image_features_dim and the image_encoder_num_layers. I have it sort of working now, but its hacked up. I'll drop a pull request in a bit after I get it sorted. Cool project btw, I was literally working one similar just before I stumbled on yours.
https://github.com/hyunwoongko/openchat/pull/36
This has just the GPU fix in it so you can see the example of how I fish through the gpu id. We might want to do the same for the other 2?
Ok I'll check problem and fix them. Thanks a lot.
I think I've cleared all of the GPU assignment issues. Closing.
Whenever I launch a wizard of wikipedia OpenChat and give it a valid topic and user I get the error:
RuntimeError: Input, output and indices must be on the current device
I simply just loaded it this way:
`from openchat import OpenChat
if name == 'main': OpenChat(model="wizard_of_wikipedia.end2end_generator", device="cpu") `
It also happens when I use cuda:
`from openchat import OpenChat
if name == 'main': OpenChat(model="wizard_of_wikipedia.end2end_generator", device="cuda") ` Other models like gptneo work fine. It is just things that back into ParlAI I think. Stack Trace here:
Traceback (most recent call last): File "wow.py", line 4, in <module> OpenChat(model="wizard_of_wikipedia.end2end_generator", device="cpu") File "/root/openchat/openchat/openchat.py", line 33, in __init__ self.environment.start(self.agent, **kwargs) File "/root/openchat/openchat/envs/interactive.py", line 100, in start bot_message = agent.predict(model_input, **kwargs)["output"] File "/root/openchat/openchat/base/agents/wow.py", line 111, in predict return super().predict( File "/root/openchat/env/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context return func(*args, **kwargs) File "/root/openchat/openchat/base/agents/parlai.py", line 101, in predict tokens = self.model._generate( File "/root/openchat/env/lib/python3.8/site-packages/parlai/core/torch_generator_agent.py", line 1079, in _generate encoder_states = model.encoder(*self._encoder_input(batch)) File "/root/openchat/env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl result = self.forward(*input, **kwargs) File "/root/openchat/env/lib/python3.8/site-packages/projects/wizard_of_wikipedia/generator/modules.py", line 86, in forward context_encoded, context_mask = self.transformer(src_tokens) File "/root/openchat/env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl result = self.forward(*input, **kwargs) File "/root/openchat/env/lib/python3.8/site-packages/parlai/agents/transformer/modules/encoder.py", line 271, in forward tensor, mask = self.forward_embedding(input, positions, segments) File "/root/openchat/env/lib/python3.8/site-packages/parlai/agents/transformer/modules/encoder.py", line 179, in forward_embedding tensor = self.embeddings(input) File "/root/openchat/env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl result = self.forward(*input, **kwargs) File "/root/openchat/env/lib/python3.8/site-packages/torch/nn/modules/sparse.py", line 156, in forward return F.embedding( File "/root/openchat/env/lib/python3.8/site-packages/torch/nn/functional.py", line 1916, in embedding return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse) RuntimeError: Input, output and indices must be on the current device
I tried a bit to debug this one but I don't quite see where the 3 land.