togethercomputer / OpenChatKit

Apache License 2.0
9k stars 1.01k forks source link

RedPanda training inference error #115

Open qrpike opened 1 year ago

qrpike commented 1 year ago

Describe the bug After following the red panda fine tuning tutorial, running the bot inference script with the output model results in an error.

$python ./inference/bot.py  --model=model_ckpts/hf/
Loading model_ckpts/hf/ to cuda:0...
Welcome to OpenChatKit shell.   Type /help or /? to list commands.

>>> who is allen turing?
Traceback (most recent call last):
  File "/home/ubuntu/OpenChatKit/./inference/bot.py", line 285, in <module>
    main()
  File "/home/ubuntu/OpenChatKit/./inference/bot.py", line 281, in main
    ).cmdloop()
  File "/home/ubuntu/miniconda3/envs/OpenChatKit/lib/python3.10/cmd.py", line 138, in cmdloop
    stop = self.onecmd(line)
  File "/home/ubuntu/miniconda3/envs/OpenChatKit/lib/python3.10/cmd.py", line 217, in onecmd
    return func(arg)
  File "/home/ubuntu/OpenChatKit/./inference/bot.py", line 150, in do_say
    output = self._model.do_inference(
  File "/home/ubuntu/OpenChatKit/./inference/bot.py", line 92, in do_inference
    outputs = self._model.generate(
  File "/home/ubuntu/miniconda3/envs/OpenChatKit/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/home/ubuntu/miniconda3/envs/OpenChatKit/lib/python3.10/site-packages/transformers/generation_utils.py", line 1326, in generate
    return self.sample(
  File "/home/ubuntu/miniconda3/envs/OpenChatKit/lib/python3.10/site-packages/transformers/generation_utils.py", line 1981, in sample
    next_tokens = torch.multinomial(probs, num_samples=1).squeeze(1)
RuntimeError: probability tensor contains either `inf`, `nan` or element < 0

To Reproduce Steps to reproduce the behavior:

Expected behavior Inference to run properly.

Environment: The code is running on a lambdalabs 8xA100 40GB SMX4

ChengYen-Tang commented 1 year ago

https://github.com/togethercomputer/OpenChatKit/issues/86#issuecomment-1667192363