ictnlp / StreamSpeech

StreamSpeech is an “All in One” seamless model for offline and simultaneous speech recognition, speech translation and speech synthesis.
https://ictnlp.github.io/StreamSpeech-site/
MIT License
884 stars 66 forks source link

Error when loading speech_to_speech_ctc task #2

Closed tiannanzhang closed 3 months ago

tiannanzhang commented 3 months ago

Description

When running the simuleval command with the speech_to_speech.streamspeech agent, I encountered the following error:

Traceback (most recent call last): File "/Users/arararz/anaconda3/envs/streamspeech/bin/simuleval", line 33, in sys.exit(load_entry_point('simuleval', 'console_scripts', 'simuleval')()) File "/Users/arararz/Documents/GitHub/StreamSpeech/SimulEval/simuleval/cli.py", line 47, in main system, args = build_system_args() File "/Users/arararz/Documents/GitHub/StreamSpeech/SimulEval/simuleval/utils/agent.py", line 131, in build_system_args system = system_class.from_args(args) File "/Users/arararz/Documents/GitHub/StreamSpeech/SimulEval/simuleval/agents/agent.py", line 161, in from_args return cls(args) File "/Users/arararz/Documents/GitHub/StreamSpeech/agent/speech_to_speech.streamspeech.agent.py", line 117, in init self.load_model_vocab(args) File "/Users/arararz/Documents/GitHub/StreamSpeech/agent/speech_to_speech.streamspeech.agent.py", line 382, in load_model_vocab task = tasks.setup_task(task_args) File "/Users/arararz/Documents/GitHub/StreamSpeech/fairseq/fairseq/tasks/init.py", line 31, in setup_task task = TASK_REGISTRY[task_name] KeyError: 'speech_to_speech_ctc'

The error seems to be related to the speech_to_speech_ctc task not being found in the task registry.

Steps to Reproduce

  1. Set up the StreamSpeech environment
  2. Run the simultaneous s2st script provided in the readme

Environment

zhangshaolei1998 commented 3 months ago

Hi, thanks for your issue.

Yesterday's first commit had a bug in --user-dir, refer to issue#1. I have fixed it in the latest submitted version this afternoon.

I guess your issue may be due to the mismatch of repo version or command. It is recommended that you update to the latest version and then run this script:

export CUDA_VISIBLE_DEVICES=0

ROOT=/data/zhangshaolei/StreamSpeech # path to StreamSpeech repo
PRETRAIN_ROOT=/data/zhangshaolei/pretrain_models 
VOCODER_CKPT=$PRETRAIN_ROOT/unit-based_HiFi-GAN_vocoder/mHuBERT.layer11.km1000.en/g_00500000 # path to downloaded Unit-based HiFi-GAN Vocoder
VOCODER_CFG=$PRETRAIN_ROOT/unit-based_HiFi-GAN_vocoder/mHuBERT.layer11.km1000.en/config.json # path to downloaded Unit-based HiFi-GAN Vocoder

LANG=fr
file=streamspeech.simultaneous.${LANG}-en.pt # path to downloaded StreamSpeech model
output_dir=$ROOT/res/streamspeech.simultaneous.${LANG}-en/simul-s2st

chunk_size=320 #ms
PYTHONPATH=$ROOT/fairseq simuleval --data-bin ${ROOT}/configs/${LANG}-en \
    --user-dir ${ROOT}/researches/ctc_unity --agent-dir ${ROOT}/agent \
    --source example/wav_list.txt --target example/target.txt \
    --model-path $file \
    --config-yaml config_gcmvn.yaml --multitask-config-yaml config_mtl_asr_st_ctcst.yaml \
    --agent $ROOT/agent/speech_to_speech.streamspeech.agent.py \
    --vocoder $VOCODER_CKPT --vocoder-cfg $VOCODER_CFG --dur-prediction \
    --output $output_dir/chunk_size=$chunk_size \
    --source-segment-size $chunk_size \
    --quality-metrics ASR_BLEU  --target-speech-lang en --latency-metrics AL AP DAL StartOffset EndOffset LAAL ATD NumChunks DiscontinuitySum DiscontinuityAve DiscontinuityNum RTF \
    --device gpu --computation-aware \
    --output-asr-translation True

Note that --user-dir ${ROOT}/researches/ctc_unity --agent-dir ${ROOT}/agent is the part that has been modified compared to the previous version.

Hope this can solve your problem.

tiannanzhang commented 3 months ago

Thanks a lot! I did not realize that the readme was modified so I used my previously copied code.