Closed liechtym closed 8 months ago
Sorry I don't think this is related to transformers
as there is a wrapper around it. sdpa
is natively supported in transformers
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
I was trying to use the generate API for Llama 2 using the same code from this example: https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/transformers-neuronx/transformers-neuronx-developer-guide.html#features
My code:
Error:
Is there a work around for this? Or is supporting this attention implemention the only way? I simply want to use the generate api with a neuron-compiled model.