CUDA Error during Inference with Typical_P sampling of 1.0

harubaru commented 2 years ago

A cuda error occurs during inference with this exception:

Exception: Could not generate text with Sukima. Error: {'detail': 'Invalid request body!\nCUDA error: device-side assert triggered\nCUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.\nFor debugging consider passing CUDA_LAUNCH_BLOCKING=1.\nTraceback (most recent call last):\n  File "/sukima/./app/api/v1/endpoints/models.py", line 57, in generate\n    return m.generate(request.dict(), db_softprompt=db_softprompt)\n  File "/usr/local/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 28, in decorate_context\n    return func(*args, **kwargs)\n  File "/sukima/./app/gpt/gpthf.py", line 311, in generate\n    input_ids = self.tokenizer.encode(prompt, return_tensors=\'pt\').to(self.device)\nRuntimeError: CUDA error: device-side assert triggered\nCUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.\nFor debugging consider passing CUDA_LAUNCH_BLOCKING=1.\n'}

This occurs when a Typical_P of 1.0 is used.

harubaru commented 2 years ago

This also appears to be an issue on the original implementation of Typical sampling. https://github.com/huggingface/transformers/issues/16080

harubaru commented 2 years ago

Fixed in https://github.com/hitomi-team/sukima/commit/16d3bb06ecc58c901c97a978d5db13eaaff35431

hitomi-team / sukima

CUDA Error during Inference with Typical_P sampling of 1.0 #27