xlang-ai / instructor-embedding

[ACL 2023] One Embedder, Any Task: Instruction-Finetuned Text Embeddings
Apache License 2.0
1.78k stars 131 forks source link

Why I cannot save model? #97

Open txye opened 7 months ago

txye commented 7 months ago

raise RuntimeError( RuntimeError: Some tensors share memory, this will lead to duplicate memory on disk and potential differences when loading them again: [{'0.auto_model.shared.weight', '0.auto_model.encoder.embed_tokens.weight'}]. A potential way to correctly save your model is to use save_model. More information at https://huggingface.co/docs/safetensors/torch_shared_tensors

nprasanthi7 commented 7 months ago

RuntimeError: Some tensors share memory, this will lead to duplicate memory on disk and potential differences when loading them again: [{'0.auto_model.encoder.embed_tokens.weight', '0.auto_model.shared.weight'}]. A potential way to correctly save your model is to use save_model. More information at https://huggingface.co/docs/safetensors/torch_shared_tensors

       Could you please help me to resolve this?
hongjin-su commented 6 months ago

Hi, Thanks a lot for your interest in the INSTRUCTOR model!

Could you provide a short script for me to reproduce the error?

tush05tgsingh commented 3 months ago

I am getting the same error! I don't know how to solve this @hongjin-su I hope you would help me in this:

Traceback (most recent call last): File "/ClusterLLM/perspective/2_finetune/finetune.py", line 617, in main() File "ClusterLLM/perspective/2_finetune/finetune.py", line 598, in main train_result = trainer.train(resume_from_checkpoint=checkpoint) File ".conda/envs/696ds/lib/python3.9/site-packages/transformers/trainer.py", line 1624, in train return inner_training_loop( File ".conda/envs/696ds/lib/python3.9/site-packages/transformers/trainer.py", line 2029, in _inner_training_loop self._maybe_log_save_evaluate(tr_loss, grad_norm, model, trial, epoch, ignore_keys_for_eval) File ".conda/envs/696ds/lib/python3.9/site-packages/transformers/trainer.py", line 2423, in _maybe_log_save_evaluate self._save_checkpoint(model, trial, metrics=metrics) File "conda/envs/696ds/lib/python3.9/site-packages/transformers/trainer.py", line 2499, in _save_checkpoint self.save_model(staging_output_dir, _internal_call=True) File ".conda/envs/696ds/lib/python3.9/site-packages/transformers/trainer.py", line 3016, in save_model self._save(output_dir) File ".conda/envs/696ds/lib/python3.9/site-packages/transformers/trainer.py", line 3083, in _save safetensors.torch.save_file( File ".conda/envs/696ds/lib/python3.9/site-packages/safetensors/torch.py", line 281, in save_file serialize_file(_flatten(tensors), filename, metadata=metadata) File ".conda/envs/696ds/lib/python3.9/site-packages/safetensors/torch.py", line 477, in _flatten raise RuntimeError( RuntimeError: Some tensors share memory, this will lead to duplicate memory on disk and potential differences when loading them again: [{'0.auto_model.shared.weight', '0.auto_model.encoder.embed_tokens.weight'}]. A potential way to correctly save your model is to use save_model. More information at https://huggingface.co/docs/safetensors/torch_shared_tensors