Open mikethebos opened 1 day ago
This looks like it's happening in generate()
, so cc @gante! Let me know if you think it's a pipeline issue instead and I'll handle it.
@Rocketknight1 i guess layer_device_map
is missing in offloaded_static cache.
i have a WIP pr https://github.com/huggingface/transformers/pull/34330/files
can you review and comment?
System Info
Transformers Patch release v4.45.2 PyTorch 1.10.1 Python 3.8.0 cuda 11.1 NVIDIA V100
Who can help?
@gante @zucchini-nlp @Rocketknight1
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
Stack trace:
Code:
Expected behavior
assistant_response should be a generated response from the LLaMa model.