Closed eeyrw closed 1 year ago
Hey @eeyrw ,
I think this issue should maybe be posted in PyTorch? https://github.com/huggingface/safetensors
either way cc @Narsil here
Great I have a good reproducible example.
Both Deepspeed and now this seem to be using this weird technique of messing up storage (sharing storage without sharing tensors). At least with a good example, I can figure what they are doing and either fix it in safetensors, inform users about what's going on.
https://github.com/huggingface/safetensors/pull/309 Shoudl fix it.
I'll make a release once this and another PR are passed
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
Describe the bug
It is about https://github.com/huggingface/safetensors/issues/202 and https://github.com/huggingface/transformers/pull/22437. For certain use case, the issue finally comes to diffusers. According my observation, the shared tensors are introduced by
ZeroRedundancyOptimizer
😏 and then comes tosave_pretrained
😏 method ofModelMixin
😏 and finally makesafetensors
angry😤. Due tosave_pretrained
executing save procedure so I can not use tricksave_model
https://huggingface.co/docs/safetensors/torch_shared_tensors provided by safetensors.Reproduction
Run
CUDA_VISIBLE_DEVICES=2,3,4,5,6,7 torchrun --nproc_per_node=6 SDSaveIssue.py
Logs
System Info
diffusers
version: 0.19.2Who can help?
No response