Open zziC7 opened 3 weeks ago
Problem Overview
When I run
maskgct_inference.py
, I found it takes too long to load the model during inference.Steps Taken
- I record how long each step takes.
a. build stage
start_time = time.time() # 1. build semantic model (w2v-bert-2.0) semantic_model, semantic_mean, semantic_std = build_semantic_model(device) # 2. build semantic codec semantic_codec = build_semantic_codec(cfg.model.semantic_codec, device) # 3. build acoustic codec codec_encoder, codec_decoder = build_acoustic_codec( cfg.model.acoustic_codec, device ) # 4. build t2s model t2s_model = build_t2s_model(cfg.model.t2s_model, device) # 5. build s2a model s2a_model_1layer = build_s2a_model(cfg.model.s2a_model.s2a_1layer, device) s2a_model_full = build_s2a_model(cfg.model.s2a_model.s2a_full, device) end_time = time.time() build_time = end_time - start_time print(f"build_time: {build_time} seconds")
b. download stage
start_time = time.time() # download checkpoint # download semantic codec ckpt semantic_code_ckpt = hf_hub_download( "amphion/MaskGCT", filename="semantic_codec/model.safetensors" ) # download acoustic codec ckpt codec_encoder_ckpt = hf_hub_download( "amphion/MaskGCT", filename="acoustic_codec/model.safetensors" ) codec_decoder_ckpt = hf_hub_download( "amphion/MaskGCT", filename="acoustic_codec/model_1.safetensors" ) # download t2s model ckpt t2s_model_ckpt = hf_hub_download( "amphion/MaskGCT", filename="t2s_model/model.safetensors" ) # download s2a model ckpt s2a_1layer_ckpt = hf_hub_download( "amphion/MaskGCT", filename="s2a_model/s2a_model_1layer/model.safetensors" ) s2a_full_ckpt = hf_hub_download( "amphion/MaskGCT", filename="s2a_model/s2a_model_full/model.safetensors" ) end_time = time.time() download_time = end_time - start_time print(f"download_time: {download_time} seconds")
c. load stage
start_time = time.time() # load semantic codec safetensors.torch.load_model(semantic_codec, semantic_code_ckpt) # load acoustic codec safetensors.torch.load_model(codec_encoder, codec_encoder_ckpt) safetensors.torch.load_model(codec_decoder, codec_decoder_ckpt) # load t2s model safetensors.torch.load_model(t2s_model, t2s_model_ckpt) # load s2a model safetensors.torch.load_model(s2a_model_1layer, s2a_1layer_ckpt) safetensors.torch.load_model(s2a_model_full, s2a_full_ckpt) end_time = time.time() load_time = end_time - start_time print(f"load_time: {load_time} seconds")
d. inference stage
start_time = time.time() recovered_audio = maskgct_inference_pipeline.maskgct_inference( prompt_wav_path, prompt_text, target_text, "zh", "zh", target_len=10 ) end_time = time.time() infer_time = end_time - start_time print(f"Inference time for line {line_num}: {infer_time} seconds")
Then I found:
build_time: 202.2886848449707 seconds download_time: 60.34766387939453 seconds load_time: 32.2975959777832 seconds Inference time for line 2: 14.074496269226074 seconds
Expected Outcome
Does this mean that if I want to inference a piece of audio, then I have to wait a long time for the model to load? Or is there something wrong with my Settings?
Hi, what is the inference speed of this model? It said that it is NAR model structure, but there are two big model in this arch, I guess it will be no quicker then former AR based pipelines.
Hi, the model only needs to load the model once. You can use the Gradio demo or Jupyter Notebook to maintain the models in memory. Besides, since not all required dependencies are pre-downloaded, it still takes time to download them from the web. This only takes time when you first generate a specific language sentence.
Problem Overview
When I run
maskgct_inference.py
, I found it takes too long to load the model during inference.Steps Taken
a. build stage
b. download stage
c. load stage
d. inference stage
Then I found:
Expected Outcome
Does this mean that if I want to inference a piece of audio, then I have to wait a long time for the model to load? Or is there something wrong with my Settings?