hitomi-team / sukima

A ready-to-deploy container for implementing an easy to use REST API to access Language Models.
GNU General Public License v2.0
64 stars 13 forks source link

CUDA Error during Inference with Typical_P sampling of 1.0 #27

Closed harubaru closed 2 years ago

harubaru commented 2 years ago

A cuda error occurs during inference with this exception:

Exception: Could not generate text with Sukima. Error: {'detail': 'Invalid request body!\nCUDA error: device-side assert triggered\nCUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.\nFor debugging consider passing CUDA_LAUNCH_BLOCKING=1.\nTraceback (most recent call last):\n  File "/sukima/./app/api/v1/endpoints/models.py", line 57, in generate\n    return m.generate(request.dict(), db_softprompt=db_softprompt)\n  File "/usr/local/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 28, in decorate_context\n    return func(*args, **kwargs)\n  File "/sukima/./app/gpt/gpthf.py", line 311, in generate\n    input_ids = self.tokenizer.encode(prompt, return_tensors=\'pt\').to(self.device)\nRuntimeError: CUDA error: device-side assert triggered\nCUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.\nFor debugging consider passing CUDA_LAUNCH_BLOCKING=1.\n'}

This occurs when a Typical_P of 1.0 is used.

harubaru commented 2 years ago

This also appears to be an issue on the original implementation of Typical sampling. https://github.com/huggingface/transformers/issues/16080

harubaru commented 2 years ago

Fixed in https://github.com/hitomi-team/sukima/commit/16d3bb06ecc58c901c97a978d5db13eaaff35431