Open captify-sivakhno opened 3 months ago
Can you share the exact prompts you are sending? This issue occurs sporadically, so detailed reproduction instructions would be very beneficial for us
@robertgshaw2-neuralmagic thanks for fast reply, here's the link to a file with 5000 prompts
generated as
with open('/Volumes/qa/tv_segmentation_bronze/misc/formatted_prompts.txt', 'w') as f:
for item in formatted_prompts:
f.write("%s\n" % item)
This is what went into the input
output = llm.generate(formatted_prompts, sampling_params)
BTW @robertgshaw2-neuralmagic if you have access to Databricks, one option to easily and fully reproduce environment is running in notebook on 15.4 LTS ML Beta (15.4.x-gpu-ml-scala2.12) runtime, as that's where I ran it.
@robertgshaw2-neuralmagic - regarding your comment about the prompts content above, any suggestions as to which properties of prompts might be causing the error. I have rerun by re-using only the first prompt as an example
# other code as before
output = llm.generate(formatted_prompts[0]*len(formatted_prompts), sampling_params)
and it completed fine. This is encouraging, but the range of error possibilities is quite high (length of prompt, token composition, pattern of cache reuse, etc)
mark. met same problem
mark, met same problem in v0.5.0post1
same
Also seeing the same problem and I found the issues arises at the time when a cached prefill request scheduled together with non-cached request. The problem is gone if I force it to only schedule one prefill request. Still debugging.
This issue has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this issue should remain open. Thank you!
Your current environment
🐛 Describe the bug
After running the code
I get an error
The error seems to happen randomly and sometimes I don't get an error running the same command in the same environment and versions,
I have done the following investigations and can confirm:
enable_prefix_caching=False
removes the errorThe Python process exited with exit code 139 (SIGSEGV: Segmentation fault)
I have seen quite a few different issues with
enable_prefix_caching
, could anyone comment if the feature actually worked for them? We have a lot of 80-90% repetitive prompts in our use cases so prefix caching provides dramatic speed-up. Would be grateful for any suggestions!Full error detail
``` Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions. File