Open mikethebos opened 1 month ago
This looks like it's happening in generate()
, so cc @gante! Let me know if you think it's a pipeline issue instead and I'll handle it.
@Rocketknight1 i guess layer_device_map
is missing in offloaded_static cache.
i have a WIP pr https://github.com/huggingface/transformers/pull/34330/files
can you review and comment?
FYI: with @chengchengpei's patch, I'm now getting a similar error for batch_size
: TypeError: OffloadedStaticCache.__init__() got an unexpected keyword argument 'batch_size'
@mikethebos
max_batch_size
renamed to batch_size
https://github.com/huggingface/transformers/blob/main/src/transformers/generation/utils.py#L1636
but not renamed in https://github.com/huggingface/transformers/blob/main/src/transformers/cache_utils.py#L1914
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
I believe @chengchengpei's fix is still relevant here.
@mikethebos yep, we'll merge the PR as soon as it is ready
System Info
Transformers Patch release v4.45.2 PyTorch 1.10.1 Python 3.8.0 cuda 11.1 NVIDIA V100
Who can help?
@gante @zucchini-nlp @Rocketknight1
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
Stack trace:
Code:
Expected behavior
assistant_response should be a generated response from the LLaMa model.