processing the finetuning(qlora)

gkckdtn126 commented 1 month ago

In processing the finetuning(qlora), I found the problem below. How do I fix this error? I applied trust_remote_code=True to finetune.py and it still doesn't work like that.

rank0: File "/home/jovyan/.local/lib/python3.11/site-packages/transformers/dynamic_module_utils.py", line 598, in resolve_trust_remote_code rank0: answer = input(

rank0: File "/home/jovyan/.local/lib/python3.11/site-packages/transformers/dynamic_module_utils.py", line 580, in _raise_timeout_error rank0: raise ValueError( rank0: ValueError: Loading this model requires you to execute custom code contained in the model repository on your local machine. Please set the option trust_remote_code=True to permit loading of this model.

rank0: During handling of the above exception, another exception occurred:

rank0: Traceback (most recent call last): rank0: File "/home/jovyan/.local/lib/python3.11/site-packages/optimum/gptq/quantizer.py", line 369, in quantize_model rank0: tokenizer = AutoTokenizer.from_pretrained(tokenizer)

rank0: File "/home/jovyan/.local/lib/python3.11/site-packages/transformers/models/auto/tokenization_auto.py", line 752, in from_pretrained rank0: config = AutoConfig.from_pretrained(

rank0: File "/home/jovyan/.local/lib/python3.11/site-packages/transformers/models/auto/configuration_auto.py", line 1085, in from_pretrained rank0: trust_remote_code = resolve_trust_remote_code(

rank0: File "/home/jovyan/.local/lib/python3.11/site-packages/transformers/dynamic_module_utils.py", line 611, in resolve_trust_remote_code rank0: raise ValueError( rank0: ValueError: The repository for cckevinn/SeeClick contains custom code which must be executed to correctly load the model. You can inspect the repository content at https://hf.co/cckevinn/SeeClick. rank0: Please pass the argument trust_remote_code=True to allow custom code to be run.

rank0: During handling of the above exception, another exception occurred:

rank0: Traceback (most recent call last): rank0: File "/home/jovyan/work/SeeClick/finetune/finetune.py", line 408, in

rank0: File "/home/jovyan/work/SeeClick/finetune/finetune.py", line 312, in train rank0: model = transformers.AutoModelForCausalLM.from_pretrained(

rank0: File "/home/jovyan/.local/lib/python3.11/site-packages/transformers/models/auto/auto_factory.py", line 561, in from_pretrained rank0: return model_class.from_pretrained(

rank0: File "/home/jovyan/.local/lib/python3.11/site-packages/transformers/modeling_utils.py", line 3780, in from_pretrained rank0: quantizer.quantize_model(model, quantization_config.tokenizer) rank0: File "/home/jovyan/.local/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context rank0: return func(*args, **kwargs)

rank0: File "/home/jovyan/.local/lib/python3.11/site-packages/optimum/gptq/quantizer.py", line 371, in quantize_model rank0: raise ValueError( rank0: ValueError: We were not able to get the tokenizer using AutoTokenizer.from_pretrained rank0: with the string that you have passed cckevinn/SeeClick. If you have a custom tokenizer, you can pass it as input. rank0: For now, we only support quantization for text model. Support for vision, speech and multimodel will come later.

gkckdtn126 commented 1 month ago

Also, When I downloaded SeeClick directly and proceeded with finetuning, the error below came out.

Do you wish to run the custom code? [y/N] y
[rank0]: Traceback (most recent call last):
[rank0]:   File "/home/jovyan/.local/lib/python3.11/site-packages/optimum/gptq/quantizer.py", line 369, in quantize_model
[rank0]:     tokenizer = AutoTokenizer.from_pretrained(tokenizer)
[rank0]:                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/jovyan/.local/lib/python3.11/site-packages/transformers/models/auto/tokenization_auto.py", line 815, in from_pretrained
[rank0]:     raise ValueError(
[rank0]: ValueError: Unrecognized configuration class <class 'transformers_modules.SeeClick-pretrain.configuration_qwen.QWenConfig'> to build an AutoTokenizer.
[rank0]: Model type should be one of AlbertConfig, AlignConfig, BarkConfig, BartConfig, BertConfig, BertGenerationConfig, BigBirdConfig, BigBirdPegasusConfig, BioGptConfig, BlenderbotConfig, BlenderbotSmallConfig, BlipConfig, Blip2Config, BloomConfig, BridgeTowerConfig, BrosConfig, CamembertConfig, CanineConfig, ChineseCLIPConfig, ClapConfig, CLIPConfig, CLIPSegConfig, ClvpConfig, LlamaConfig, CodeGenConfig, ConvBertConfig, CpmAntConfig, CTRLConfig, Data2VecAudioConfig, Data2VecTextConfig, DebertaConfig, DebertaV2Config, DistilBertConfig, DPRConfig, ElectraConfig, ErnieConfig, ErnieMConfig, EsmConfig, FalconConfig, FlaubertConfig, FNetConfig, FSMTConfig, FunnelConfig, GitConfig, GPT2Config, GPT2Config, GPTBigCodeConfig, GPTNeoConfig, GPTNeoXConfig, GPTNeoXJapaneseConfig, GPTJConfig, GPTSanJapaneseConfig, GroupViTConfig, HubertConfig, IBertConfig, IdeficsConfig, InstructBlipConfig, JukeboxConfig, Kosmos2Config, LayoutLMConfig, LayoutLMv2Config, LayoutLMv3Config, LEDConfig, LiltConfig, LlamaConfig, LlavaConfig, LongformerConfig, LongT5Config, LukeConfig, LxmertConfig, M2M100Config, MarianConfig, MBartConfig, MegaConfig, MegatronBertConfig, MgpstrConfig, MistralConfig, MixtralConfig, MobileBertConfig, MPNetConfig, MptConfig, MraConfig, MT5Config, MusicgenConfig, MvpConfig, NezhaConfig, NllbMoeConfig, NystromformerConfig, OneFormerConfig, OpenAIGPTConfig, OPTConfig, Owlv2Config, OwlViTConfig, PegasusConfig, PegasusXConfig, PerceiverConfig, PersimmonConfig, PhiConfig, Pix2StructConfig, PLBartConfig, ProphetNetConfig, QDQBertConfig, RagConfig, RealmConfig, ReformerConfig, RemBertConfig, RetriBertConfig, RobertaConfig, RobertaPreLayerNormConfig, RoCBertConfig, RoFormerConfig, RwkvConfig, SeamlessM4TConfig, SeamlessM4Tv2Config, Speech2TextConfig, Speech2Text2Config, SpeechT5Config, SplinterConfig, SqueezeBertConfig, SwitchTransformersConfig, T5Config, TapasConfig, TransfoXLConfig, TvpConfig, UMT5Config, ViltConfig, VisualBertConfig, VitsConfig, Wav2Vec2Config, Wav2Vec2ConformerConfig, WhisperConfig, XCLIPConfig, XGLMConfig, XLMConfig, XLMProphetNetConfig, XLMRobertaConfig, XLMRobertaXLConfig, XLNetConfig, XmodConfig, YosoConfig.

[rank0]: During handling of the above exception, another exception occurred
[rank0]: Traceback (most recent call last)
[rank0]:   File "/home/jovyan/work/SeeClick/finetune/finetune.py", line 405, in <module>
[rank0]:     train()
[rank0]:   File "/home/jovyan/work/SeeClick/finetune/finetune.py", line 312, in train
[rank0]:     model = transformers.AutoModelForCausalLM.from_pretrained(
[rank0]:             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/jovyan/.local/lib/python3.11/site-packages/transformers/models/auto/auto_factory.py", line 561, in from_pretrained
[rank0]:     return model_class.from_pretrained(
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/jovyan/.local/lib/python3.11/site-packages/transformers/modeling_utils.py", line 3780, in from_pretrained
[rank0]:     quantizer.quantize_model(model, quantization_config.tokenizer)
[rank0]:   File "/home/jovyan/.local/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
[rank0]:     return func(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/jovyan/.local/lib/python3.11/site-packages/optimum/gptq/quantizer.py", line 371, in quantize_model
[rank0]:     raise ValueError(
[rank0]: ValueError: We were not able to get the tokenizer using `AutoTokenizer.from_pretrained`
[rank0]:                         with the string that you have passed /home/jovyan/work/SeeClick/finetune/SeeClick-pretrain. If you have a custom tokenizer, you can pass it as input.
[rank0]:                         For now, we only support quantization for text model. Support for vision, speech and multimodel will come later.
W0514 08:24:29.947000 139669093369664 torch/distributed/elastic/multiprocessing/api.py:851] Sending process 125591 closing signal SIGTERM
E0514 08:24:30.212000 139669093369664 torch/distributed/elastic/multiprocessing/api.py:826] failed (exitcode: 1) local_rank: 0 (pid: 125590) of binary: /opt/conda/bin/python3.11
Traceback (most recent call last):
  File "/home/jovyan/.local/bin/torchrun", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/home/jovyan/.local/lib/python3.11/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 347, in wrapper
    return f(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^
  File "/home/jovyan/.local/lib/python3.11/site-packages/torch/distributed/run.py", line 879, in main
    run(args)
  File "/home/jovyan/.local/lib/python3.11/site-packages/torch/distributed/run.py", line 870, in run
    elastic_launch(
  File "/home/jovyan/.local/lib/python3.11/site-packages/torch/distributed/launcher/api.py", line 132, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jovyan/.local/lib/python3.11/site-packages/torch/distributed/launcher/api.py", line 263, in launch_agent
    raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError: 
============================================================
finetune.py FAILED
------------------------------------------------------------
Failures:
  <NO_OTHER_FAILURES>
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
  time      : 2024-05-14_08:24:29
  host      : jupyter-changsu12-2dha---49nternlm
  rank      : 0 (local_rank: 0)
  exitcode  : 1 (pid: 125590)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
============================================================

njucckevin commented 1 month ago

Hi,

I have not tried to train the model with qlora. The error ValueError: Unrecognized configuration class <class 'transformers_modules.SeeClick-pretrain.configuration_qwen.QWenConfig'> to build an AutoTokenizer. looks strange, because in this codebase the from_pretrained should try to load a checkpoint from local files, not the transformers_modules. Have you tried loading the checkpoint of Qwen-VL locally and does it work?

gkckdtn126 commented 1 month ago

How much gpu memory did you use when finetuning with lora?

njucckevin commented 1 month ago

We used 8*A100 80G to fine-tune with lora. And I remember the training GPU memory is about 70G for a single GPU. The training resources required for SeeClick should be roughly the same as for Qwen-VL, except that we adopt more parameters for lora fine-tuning as in finetune/finetune.py lines 315-327.

njucckevin / SeeClick

processing the finetuning(qlora) #32