Closed BigDataMLexplorer closed 1 month ago
Hey! Looks like you have trust_remote_code=True
when loading the model and hitting the error here. Can you try loading with
model = AutoModelForSequenceClassification.from_pretrained("path", num_labels=num_labels ,attn_implementation="eager", trust_remote_code=False
Hey! Looks like you have
trust_remote_code=True
when loading the model and hitting the error here. Can you try loading with
model = AutoModelForSequenceClassification.from_pretrained("path", num_labels=num_labels ,attn_implementation="eager", trust_remote_code=True
Thanks for the reply, but that doesn't help. I have the code stored in my repository. I work on a server without internet access and I have to load the model locally. I keep running into the same error:
model = AutoModelForSequenceClassification.from_pretrained("path", num_labels=num_labels ,attn_implementation="eager", trust_remote_code=True
---------------------------------------------------------------------------
AssertionError Traceback (most recent call last)
Cell In[4], line 2
1 num_labels = 11
----> 2 model = AutoModelForSequenceClassification.from_pretrained("Phi3small8k",num_labels=num_labels,attn_implementation= "eager", trust_remote_code=True)#,device_map="auto")
File [~/env/lib/python3.9/site-packages/transformers/models/auto/auto_factory.py:559](http://localhost:1000/lab/tree/tmp/Configuration_Project/~/env/lib/python3.9/site-packages/transformers/models/auto/auto_factory.py#line=558), in _BaseAutoModelClass.from_pretrained(cls, pretrained_model_name_or_path, *model_args, **kwargs)
557 else:
558 cls.register(config.__class__, model_class, exist_ok=True)
--> 559 return model_class.from_pretrained(
560 pretrained_model_name_or_path, *model_args, config=config, **hub_kwargs, **kwargs
561 )
562 elif type(config) in cls._model_mapping.keys():
563 model_class = _get_model_class(config, cls._model_mapping)
File [~/env/lib/python3.9/site-packages/transformers/modeling_utils.py:3788](http://localhost:1000/lab/tree/tmp/Configuration_Project/~/env/lib/python3.9/site-packages/transformers/modeling_utils.py#line=3787), in PreTrainedModel.from_pretrained(cls, pretrained_model_name_or_path, config, cache_dir, ignore_mismatched_sizes, force_download, local_files_only, token, revision, use_safetensors, *model_args, **kwargs)
3782 config = cls._autoset_attn_implementation(
3783 config, use_flash_attention_2=use_flash_attention_2, torch_dtype=torch_dtype, device_map=device_map
3784 )
3786 with ContextManagers(init_contexts):
3787 # Let's make sure we don't run the init function of buffer modules
-> 3788 model = cls(config, *model_args, **model_kwargs)
3790 # make sure we use the model's config since the __init__ call might have copied it
3791 config = model.config
File [~/.cache/huggingface/modules/transformers_modules/Phi3small8k/modeling_phi3_small.py:1045](http://localhost:1000/lab/tree/tmp/Configuration_Project/~/.cache/huggingface/modules/transformers_modules/Phi3small8k/modeling_phi3_small.py#line=1044), in Phi3SmallForSequenceClassification.__init__(self, config)
1043 super().__init__(config)
1044 self.num_labels = config.num_labels
-> 1045 self.model = Phi3SmallModel(config)
1046 self.score = nn.Linear(config.hidden_size, self.num_labels, bias=False)
1048 # Initialize weights and apply final processing
File [~/.cache/huggingface/modules/transformers_modules/Phi3small8k/modeling_phi3_small.py:745](http://localhost:1000/lab/tree/tmp/Configuration_Project/~/.cache/huggingface/modules/transformers_modules/Phi3small8k/modeling_phi3_small.py#line=744), in Phi3SmallModel.__init__(self, config)
742 # MuP Embedding scaling
743 self.mup_embedding_multiplier = config.mup_embedding_multiplier
--> 745 self.layers = nn.ModuleList([Phi3SmallDecoderLayer(config, layer_idx) for layer_idx in range(config.num_hidden_layers)])
747 self.final_layernorm = nn.LayerNorm(config.hidden_size, eps=config.layer_norm_epsilon)
749 self.gradient_checkpointing = False
File [~/.cache/huggingface/modules/transformers_modules/Phi3small8k/modeling_phi3_small.py:745](http://localhost:1000/lab/tree/tmp/Configuration_Project/~/.cache/huggingface/modules/transformers_modules/Phi3small8k/modeling_phi3_small.py#line=744), in <listcomp>(.0)
742 # MuP Embedding scaling
743 self.mup_embedding_multiplier = config.mup_embedding_multiplier
--> 745 self.layers = nn.ModuleList([Phi3SmallDecoderLayer(config, layer_idx) for layer_idx in range(config.num_hidden_layers)])
747 self.final_layernorm = nn.LayerNorm(config.hidden_size, eps=config.layer_norm_epsilon)
749 self.gradient_checkpointing = False
File [~/.cache/huggingface/modules/transformers_modules/Phi3small8k/modeling_phi3_small.py:651](http://localhost:1000/lab/tree/tmp/Configuration_Project/~/.cache/huggingface/modules/transformers_modules/Phi3small8k/modeling_phi3_small.py#line=650), in Phi3SmallDecoderLayer.__init__(self, config, layer_idx)
649 super().__init__()
650 self.hidden_size = config.hidden_size
--> 651 self.self_attn = Phi3SmallSelfAttention(config, layer_idx)
652 self.mlp = Phi3SmallMLP(config)
654 self.input_layernorm = nn.LayerNorm(config.hidden_size, eps=config.layer_norm_epsilon)
File [~/.cache/huggingface/modules/transformers_modules/Phi3small8k/modeling_phi3_small.py:218](http://localhost:1000/lab/tree/tmp/Configuration_Project/~/.cache/huggingface/modules/transformers_modules/Phi3small8k/modeling_phi3_small.py#line=217), in Phi3SmallSelfAttention.__init__(self, config, layer_idx)
213 if self.config.dense_attention_every_n_layers and ((self.layer_idx + 1) % self.config.dense_attention_every_n_layers == 0):
214 logger.info(
215 f"Layer {layer_idx + 1} is using dense attention since it is divisible by "
216 f"{self.config.dense_attention_every_n_layers}"
217 )
--> 218 assert is_flash_attention_available, "Flash Attention is not available, but is needed for dense attention"
219 else:
220 # BlockSparse related Parameters
221 self.blocksparse_params = BlockSparseParams.from_config(config)
AssertionError: Flash Attention is not available, but is needed for dense attention
cc @Rocketknight1 As you we looking into the issue of conditional imports and the hub - looks like it's starting to block quite a few users
Yeah, understood - I'll try to get to it as soon as I can finally finish the tool use project!
@amyeroberts @Rocketknight1 Does this mean that there is currently some bug on the transformers side and this code should work under normal conditions? If so, when can I expect it to be fixed? Thank you
Yeah, understood - I'll try to get to it as soon as I can finally finish the tool use project!
@Rocketknight1 Could you please describe this problem in more detail? Is the problem on the transformers side and the model should load without flash attention? Does this mean that there is currently some bug on the transformers side and this code should work under normal conditions? If so, when can I expect it to be fixed? Thank you
@BigDataMLexplorer I did some exploration why we can't load with transformers local code, and have to trust_remote_code=True
. See https://github.com/huggingface/transformers/issues/32243#issuecomment-2268285247.
TL;DR; Phi3 small seems a bit different from tiny/medium models and thus are not yet ported to transformers. That's why the above code is loading from the Hub code, which is not maintained by us. We'll consider shipping phi-3 to transformers
@BigDataMLexplorer I did some exploration why we can't load with transformers local code, and have to
trust_remote_code=True
. See #32243 (comment).TL;DR; Phi3 small seems a bit different from tiny/medium models and thus are not yet ported to transformers. That's why the above code is loading from the Hub code, which is not maintained by us. We'll consider shipping phi-3 to transformers
@zucchini-nlp That means if I download the Phi3 14b medium model locally, shouldn't I have this same problem as the small model?
@BigDataMLexplorer for Phi-3 medium you should be able to switch attention to "sdpa" or "eager", as it is supported by transformers. Just make sure to trust_remote_code=False
Thank you, the medium version works fine :)
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
System Info
Phi3 small, flash attention, gpu
Who can help?
@ArthurZucker @muellerzr @stevhliu
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
Hi, I'm trying to load a Phi 3 small Instruct 8k model. Link: https://huggingface.co/microsoft/Phi-3-small-8k-instruct
I want to use it further for fine tuning, but I can't load it with flash attention because I have a V100 graphics cards and those are incompatible So I am trying to load it without flash attention using attn_implementation="eager":
model = AutoModelForSequenceClassification.from_pretrained("path", num_labels=num_labels ,attn_implementation="eager" )
But I still get this error:
AssertionError: Flash Attention is not available, but is needed for dense attention
Is there any way to load the model without flash attention?
I am using the latest version of
transformers 4.43.3