Closed tiannanzhang closed 3 months ago
Hi, thanks for your issue.
Yesterday's first commit had a bug in --user-dir
, refer to issue#1. I have fixed it in the latest submitted version this afternoon.
I guess your issue may be due to the mismatch of repo version or command. It is recommended that you update to the latest version and then run this script:
export CUDA_VISIBLE_DEVICES=0
ROOT=/data/zhangshaolei/StreamSpeech # path to StreamSpeech repo
PRETRAIN_ROOT=/data/zhangshaolei/pretrain_models
VOCODER_CKPT=$PRETRAIN_ROOT/unit-based_HiFi-GAN_vocoder/mHuBERT.layer11.km1000.en/g_00500000 # path to downloaded Unit-based HiFi-GAN Vocoder
VOCODER_CFG=$PRETRAIN_ROOT/unit-based_HiFi-GAN_vocoder/mHuBERT.layer11.km1000.en/config.json # path to downloaded Unit-based HiFi-GAN Vocoder
LANG=fr
file=streamspeech.simultaneous.${LANG}-en.pt # path to downloaded StreamSpeech model
output_dir=$ROOT/res/streamspeech.simultaneous.${LANG}-en/simul-s2st
chunk_size=320 #ms
PYTHONPATH=$ROOT/fairseq simuleval --data-bin ${ROOT}/configs/${LANG}-en \
--user-dir ${ROOT}/researches/ctc_unity --agent-dir ${ROOT}/agent \
--source example/wav_list.txt --target example/target.txt \
--model-path $file \
--config-yaml config_gcmvn.yaml --multitask-config-yaml config_mtl_asr_st_ctcst.yaml \
--agent $ROOT/agent/speech_to_speech.streamspeech.agent.py \
--vocoder $VOCODER_CKPT --vocoder-cfg $VOCODER_CFG --dur-prediction \
--output $output_dir/chunk_size=$chunk_size \
--source-segment-size $chunk_size \
--quality-metrics ASR_BLEU --target-speech-lang en --latency-metrics AL AP DAL StartOffset EndOffset LAAL ATD NumChunks DiscontinuitySum DiscontinuityAve DiscontinuityNum RTF \
--device gpu --computation-aware \
--output-asr-translation True
Note that --user-dir ${ROOT}/researches/ctc_unity --agent-dir ${ROOT}/agent
is the part that has been modified compared to the previous version.
Hope this can solve your problem.
Thanks a lot! I did not realize that the readme was modified so I used my previously copied code.
Description
When running the
simuleval
command with thespeech_to_speech.streamspeech
agent, I encountered the following error:Traceback (most recent call last): File "/Users/arararz/anaconda3/envs/streamspeech/bin/simuleval", line 33, in
sys.exit(load_entry_point('simuleval', 'console_scripts', 'simuleval')())
File "/Users/arararz/Documents/GitHub/StreamSpeech/SimulEval/simuleval/cli.py", line 47, in main
system, args = build_system_args()
File "/Users/arararz/Documents/GitHub/StreamSpeech/SimulEval/simuleval/utils/agent.py", line 131, in build_system_args
system = system_class.from_args(args)
File "/Users/arararz/Documents/GitHub/StreamSpeech/SimulEval/simuleval/agents/agent.py", line 161, in from_args
return cls(args)
File "/Users/arararz/Documents/GitHub/StreamSpeech/agent/speech_to_speech.streamspeech.agent.py", line 117, in init
self.load_model_vocab(args)
File "/Users/arararz/Documents/GitHub/StreamSpeech/agent/speech_to_speech.streamspeech.agent.py", line 382, in load_model_vocab
task = tasks.setup_task(task_args)
File "/Users/arararz/Documents/GitHub/StreamSpeech/fairseq/fairseq/tasks/init.py", line 31, in setup_task
task = TASK_REGISTRY[task_name]
KeyError: 'speech_to_speech_ctc'
The error seems to be related to the
speech_to_speech_ctc
task not being found in the task registry.Steps to Reproduce
Environment