[BUG]: RuntimeError: Error while trying to find names to remove to save state dict

hoore commented 2 weeks ago

File "/Users/me/Documents/Amphion/models/tts/maskgct/gradio_demo.py", line 298, in load_models safetensors.torch.load_model(codec_decoder, codec_decoder_ckpt) File "/Users/me/mambaforge/envs/mg/lib/python3.10/site-packages/safetensors/torch.py", line 204, in load_model to_removes = _remove_duplicate_names(model_state_dict, preferred_names=state_dict.keys()) File "/Users/me/mambaforge/envs/mg/lib/python3.10/site-packages/safetensors/torch.py", line 102, in _remove_duplicate_names raise RuntimeError( RuntimeError: Error while trying to find names to remove to save state dict, but found no suitable name to keep for saving amongst: {'model.head.istft.window'}. None is covering the entire storage.Refusing to save/load the model since you could be storing much more memory than needed. Please refer to https://huggingface.co/docs/safetensors/torch_shared_tensors for more information. Or open an issue.

safetensors.torch.load_model(codec_decoder, codec_decoder_ckpt) If I comment out this code, there will be no error. I have repeatedly confirmed that the downloaded model file is correct and complete. The same error occurs on both Windows and Mac systems.

hoore commented 2 weeks ago

Please check if there is any problem with the model file? thx

TKsavy commented 2 weeks ago

@hoore, can you try downloading the model files from Hugging Face directly instead of using the hf_hub_download function? I think the file was not downloaded properly. Try this approach and see whether it can solve your issue.

keepingitneil commented 2 weeks ago

I'm getting the same issue using a direct download. Encoder worked, just the decoder has this issue

hoore commented 2 weeks ago

@hoore, can you try downloading the model files from Hugging Face directly instead of using the hf_hub_download function? I think the file was not downloaded properly. Try this approach and see whether it can solve your issue.

yes，I also tried to download and overwrite manually, and sha 265 is also correct.

yuantuo666 commented 2 weeks ago

Hi, the MaskGCT is built in a Linux environment. For a better coding experience, it is recommended that Linux be used to reproduce.

Besides, according to the web page indicated in the error message, maybe you can try:

from safetensors.torch import load_model

load_model(model, "model.safetensors")
# Instead of model.load_state_dict(load_file("model.safetensors"))

GalenMarek14 commented 2 weeks ago

TKsavy and yuantuo666, I can run the project on my Windows 11, but I couldn't reproduce the demo page examples. What could be the problem? For example, the whisper voice example on the demo page: I downloaded the sample from there and generated the same text, but it always outputs something between a whisper and a low voice, whereas the demo page examples are successful clones. My generations are generally of lower quality, regardless of the steps I take; I've tried up to 100 iterations.

I've also tried every version, including this one, the Windows fork, and Google Colab (to try it on a Linux environment), but all of them produce inferior results compared to your examples. Are the shared models from a previous training point, by any chance? Are you able to reproduce those results with the current shared models?

This was my issue for this matter with detailed logs and outputs: https://github.com/open-mmlab/Amphion/issues/334

hoore commented 2 weeks ago

Hi, the MaskGCT is built in a Linux environment. For a better coding experience, it is recommended that Linux be used to reproduce.

Besides, according to the web page indicated in the error message, maybe you can try:
from safetensors.torch import load_model

load_model(model, "model.safetensors")
# Instead of model.load_state_dict(load_file("model.safetensors"))

I have the same problem on both Mac and Ubuntu, but my chip is ARM. Is it related to this?

hoore commented 2 weeks ago

I modified the code for loading the model, and now it can run on Macbook pro m3

replace load model:

from accelerate import load_checkpoint_and_dispatch

load_checkpoint_and_dispatch(semantic_codec, semantic_code_ckpt, device_map={"": "cpu"})
load_checkpoint_and_dispatch(codec_encoder, codec_encoder_ckpt, device_map={"": "cpu"})
load_checkpoint_and_dispatch(codec_decoder, codec_decoder_ckpt, device_map={"": "cpu"})

load_checkpoint_and_dispatch(t2s_model, t2s_model_ckpt, device_map={"": "cpu"})
load_checkpoint_and_dispatch(s2a_model_1layer, s2a_1layer_ckpt, device_map={"": "cpu"})
load_checkpoint_and_dispatch(s2a_model_full, s2a_full_ckpt, device_map={"": "cpu"})

iqrairfan100 commented 1 week ago

@hoore Thanks, modifying the gradio_demo.py file and load_model function with that code works on my M2 macbook air.

open-mmlab / Amphion

[BUG]: RuntimeError: Error while trying to find names to remove to save state dict #340