Error when loading speech_to_speech_ctc task

tiannanzhang commented 3 months ago

Description

When running the simuleval command with the speech_to_speech.streamspeech agent, I encountered the following error:

Traceback (most recent call last): File "/Users/arararz/anaconda3/envs/streamspeech/bin/simuleval", line 33, in sys.exit(load_entry_point('simuleval', 'console_scripts', 'simuleval')()) File "/Users/arararz/Documents/GitHub/StreamSpeech/SimulEval/simuleval/cli.py", line 47, in main system, args = build_system_args() File "/Users/arararz/Documents/GitHub/StreamSpeech/SimulEval/simuleval/utils/agent.py", line 131, in build_system_args system = system_class.from_args(args) File "/Users/arararz/Documents/GitHub/StreamSpeech/SimulEval/simuleval/agents/agent.py", line 161, in from_args return cls(args) File "/Users/arararz/Documents/GitHub/StreamSpeech/agent/speech_to_speech.streamspeech.agent.py", line 117, in init self.load_model_vocab(args) File "/Users/arararz/Documents/GitHub/StreamSpeech/agent/speech_to_speech.streamspeech.agent.py", line 382, in load_model_vocab task = tasks.setup_task(task_args) File "/Users/arararz/Documents/GitHub/StreamSpeech/fairseq/fairseq/tasks/init.py", line 31, in setup_task task = TASK_REGISTRY[task_name] KeyError: 'speech_to_speech_ctc'

The error seems to be related to the speech_to_speech_ctc task not being found in the task registry.

Steps to Reproduce

Set up the StreamSpeech environment
Run the simultaneous s2st script provided in the readme

Environment

Operating System: macOS (M2 Max)
Python Version: 3.10.14

zhangshaolei1998 commented 3 months ago

Hi, thanks for your issue.

Yesterday's first commit had a bug in --user-dir, refer to issue#1. I have fixed it in the latest submitted version this afternoon.

I guess your issue may be due to the mismatch of repo version or command. It is recommended that you update to the latest version and then run this script:

export CUDA_VISIBLE_DEVICES=0

ROOT=/data/zhangshaolei/StreamSpeech # path to StreamSpeech repo
PRETRAIN_ROOT=/data/zhangshaolei/pretrain_models 
VOCODER_CKPT=$PRETRAIN_ROOT/unit-based_HiFi-GAN_vocoder/mHuBERT.layer11.km1000.en/g_00500000 # path to downloaded Unit-based HiFi-GAN Vocoder
VOCODER_CFG=$PRETRAIN_ROOT/unit-based_HiFi-GAN_vocoder/mHuBERT.layer11.km1000.en/config.json # path to downloaded Unit-based HiFi-GAN Vocoder

LANG=fr
file=streamspeech.simultaneous.${LANG}-en.pt # path to downloaded StreamSpeech model
output_dir=$ROOT/res/streamspeech.simultaneous.${LANG}-en/simul-s2st

chunk_size=320 #ms
PYTHONPATH=$ROOT/fairseq simuleval --data-bin ${ROOT}/configs/${LANG}-en \
    --user-dir ${ROOT}/researches/ctc_unity --agent-dir ${ROOT}/agent \
    --source example/wav_list.txt --target example/target.txt \
    --model-path $file \
    --config-yaml config_gcmvn.yaml --multitask-config-yaml config_mtl_asr_st_ctcst.yaml \
    --agent $ROOT/agent/speech_to_speech.streamspeech.agent.py \
    --vocoder $VOCODER_CKPT --vocoder-cfg $VOCODER_CFG --dur-prediction \
    --output $output_dir/chunk_size=$chunk_size \
    --source-segment-size $chunk_size \
    --quality-metrics ASR_BLEU  --target-speech-lang en --latency-metrics AL AP DAL StartOffset EndOffset LAAL ATD NumChunks DiscontinuitySum DiscontinuityAve DiscontinuityNum RTF \
    --device gpu --computation-aware \
    --output-asr-translation True

Note that --user-dir ${ROOT}/researches/ctc_unity --agent-dir ${ROOT}/agent is the part that has been modified compared to the previous version.

Hope this can solve your problem.

tiannanzhang commented 3 months ago

Thanks a lot! I did not realize that the readme was modified so I used my previously copied code.

ictnlp / StreamSpeech