Open Source61 opened 3 months ago
Change these two functions for class GenerationMixin in /home/john/.local/lib/python3.11/site-packages/transformers/generation/utils.py to:
def _supports_default_dynamic_cache(self) -> bool:
return False and "jamba" not in self.__class__.__name__.lower() # replace, self._supports_cache_class = False
def _validate_model_kwargs(self, model_kwargs: Dict[str, Any]):
if isinstance(model_kwargs.get("past_key_values", None), Cache) and not False: # replace, self._supports_cache_class = False
raise ValueError(
f"{self.__class__.__name__} does not support an instance of `Cache` as `past_key_values`. Please "
"check the model documentation for supported cache formats."
)
Same problem with model: v2ray/Llama-3-70B
Model: WizardLMTeam/WizardCoder-33B-V1.1
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. new version of transfomer, no need to use BetterTransformer, try setting attn impl to sdpa... attn imp: <class 'transformers.models.llama.modeling_llama.LlamaSdpaAttention'> The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's
generation_output = model.generate(
^^^^^^^^^^^^^^^
File "/home/john/.local/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/home/john/.local/lib/python3.11/site-packages/transformers/generation/utils.py", line 1777, in generate
elif generation_config.cache_implementation is None and self._supports_default_dynamic_cache():
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/john/.local/lib/python3.11/site-packages/transformers/generation/utils.py", line 1454, in _supports_default_dynamic_cache
return self._supports_cache_class and "jamba" not in self.class.name.lower()
^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'AirLLMLlama2' object has no attribute '_supports_cache_class'
attention_mask
to obtain reliable results. Settingpad_token_id
toeos_token_id
:32014 for open-end generation. The attention mask is not set and cannot be inferred from input because pad token is same as eos token.As a consequence, you may observe unexpected behavior. Please pass your input'sattention_mask
to obtain reliable results. Traceback (most recent call last): File "/home/john/dev/AI/./test.py", line 24, in