A cuda error occurs during inference with this exception:
Exception: Could not generate text with Sukima. Error: {'detail': 'Invalid request body!\nCUDA error: device-side assert triggered\nCUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.\nFor debugging consider passing CUDA_LAUNCH_BLOCKING=1.\nTraceback (most recent call last):\n File "/sukima/./app/api/v1/endpoints/models.py", line 57, in generate\n return m.generate(request.dict(), db_softprompt=db_softprompt)\n File "/usr/local/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 28, in decorate_context\n return func(*args, **kwargs)\n File "/sukima/./app/gpt/gpthf.py", line 311, in generate\n input_ids = self.tokenizer.encode(prompt, return_tensors=\'pt\').to(self.device)\nRuntimeError: CUDA error: device-side assert triggered\nCUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.\nFor debugging consider passing CUDA_LAUNCH_BLOCKING=1.\n'}
A cuda error occurs during inference with this exception:
This occurs when a Typical_P of 1.0 is used.