hirofumi0810 / neural_sp

End-to-end ASR/LM implementation with PyTorch
Apache License 2.0
593 stars 140 forks source link

examples/ami/s5b recipe failing #263

Open agarwalchaitanya opened 3 years ago

agarwalchaitanya commented 3 years ago

Hi, I'm trying to run the ami recipe but it's failing with the following trace. Are there any leads on this?

============================================================================
                                  AMI                                     
============================================================================
============================================================================
                       Data Preparation (stage:0)                          
============================================================================
+ dir=/home/asr/neural_sp_assets/preprocessed_data/ami/local/downloads
+ mkdir -p /home/asr/neural_sp_assets/preprocessed_data/ami/local/downloads
+ echo 'Downloading annotations...'
Downloading annotations...
+ amiurl=http://groups.inf.ed.ac.uk/ami
+ annotver=ami_public_manual_1.6.1
+ annot=/home/asr/neural_sp_assets/preprocessed_data/ami/local/downloads/ami_public_manual_1.6.1
+ logdir=/home/asr/neural_sp_assets/preprocessed_data/ami/local/downloads
+ mkdir -p /home/asr/neural_sp_assets/preprocessed_data/ami/local/downloads/log
+ '[' '!' -f /home/asr/neural_sp_assets/preprocessed_data/ami/local/downloads/ami_public_manual_1.6.1.zip ']'
+ wget -nv -O /home/asr/neural_sp_assets/preprocessed_data/ami/local/downloads/ami_public_manual_1.6.1.zip http://groups.inf.ed.ac.uk/ami/AMICorpusAnnotations/ami_public_manual_1.6.1.zip
+ '[' '!' -d /home/asr/neural_sp_assets/preprocessed_data/ami/local/downloads/annotations ']'
+ mkdir -p /home/asr/neural_sp_assets/preprocessed_data/ami/local/downloads/annotations
+ unzip -o -d /home/asr/neural_sp_assets/preprocessed_data/ami/local/downloads/annotations /home/asr/neural_sp_assets/preprocessed_data/ami/local/downloads/ami_public_manual_1.6.1.zip
+ '[' '!' -f /home/asr/neural_sp_assets/preprocessed_data/ami/local/downloads/annotations/AMI-metadata.xml ']'
+ local/ami_xml2text.sh /home/asr/neural_sp_assets/preprocessed_data/ami/local/downloads
local/ami_xml2text.sh: line 19: [: openjdk version "11.0.9.1" 2020-11-04: integer expression expected
local/ami_xml2text.sh. Java not found. Will download exported version of transcripts.
--2021-02-03 17:12:13--  http://groups.inf.ed.ac.uk/ami/AMICorpusAnnotations/ami_manual_annotations_v1.6.1_export.gzip
Resolving groups.inf.ed.ac.uk (groups.inf.ed.ac.uk)... 129.215.202.26
Connecting to groups.inf.ed.ac.uk (groups.inf.ed.ac.uk)|129.215.202.26|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 3725858 (3.6M) [application/x-troff-man]
Saving to: '/home/asr/neural_sp_assets/preprocessed_data/ami/local/annotations/ami_manual_annotations_v1.6.1_export.gzip'

/home/asr/neural_sp_asset 100%[==================================>]   3.55M  1.37MB/s    in 2.6s    

2021-02-03 17:12:16 (1.37 MB/s) - '/home/asr/neural_sp_assets/preprocessed_data/ami/local/annotations/ami_manual_annotations_v1.6.1_export.gzip' saved [3725858/3725858]

+ wdir=/home/asr/neural_sp_assets/preprocessed_data/ami/local/annotations
+ '[' '!' -f /home/asr/neural_sp_assets/preprocessed_data/ami/local/annotations/transcripts1 ']'
+ echo 'Preprocessing transcripts...'
Preprocessing transcripts...
+ local/ami_split_segments.pl /home/asr/neural_sp_assets/preprocessed_data/ami/local/annotations/transcripts1 /home/asr/neural_sp_assets/preprocessed_data/ami/local/annotations/transcripts2
+ for dset in train eval dev
+ grep -f local/split_train.orig /home/asr/neural_sp_assets/preprocessed_data/ami/local/annotations/transcripts2
+ for dset in train eval dev
+ grep -f local/split_eval.orig /home/asr/neural_sp_assets/preprocessed_data/ami/local/annotations/transcripts2
+ for dset in train eval dev
+ grep -f local/split_dev.orig /home/asr/neural_sp_assets/preprocessed_data/ami/local/annotations/transcripts2
Getting CMU dictionary
cat: /home/asr/neural_sp_assets/preprocessed_data/ami/local/dict/cmudict/cmudict.0.7a.symbols: No such file or directory
grep: /home/asr/neural_sp_assets/preprocessed_data/ami/local/dict/cmudict/cmudict.0.7a: No such file or directory
2021-02-03 17:12:21 URL:http://www.openslr.org/resources/9/wordlist.50k.gz [139334/139334] -> "/home/asr/neural_sp_assets/preprocessed_data/ami/local/dict/wordlist.50k.gz" [1]
cat: /home/asr/neural_sp_assets/preprocessed_data/ami/ihm/train/text: No such file or directory
*Highest-count OOVs are:
Checking /home/asr/neural_sp_assets/preprocessed_data/ami/local/dict/silence_phones.txt ...
--> reading /home/asr/neural_sp_assets/preprocessed_data/ami/local/dict/silence_phones.txt
--> text seems to be UTF-8 or ASCII, checking whitespaces
--> text contains only allowed whitespaces
--> /home/asr/neural_sp_assets/preprocessed_data/ami/local/dict/silence_phones.txt is OK

Checking /home/asr/neural_sp_assets/preprocessed_data/ami/local/dict/optional_silence.txt ...
--> reading /home/asr/neural_sp_assets/preprocessed_data/ami/local/dict/optional_silence.txt
--> text seems to be UTF-8 or ASCII, checking whitespaces
--> text contains only allowed whitespaces
--> /home/asr/neural_sp_assets/preprocessed_data/ami/local/dict/optional_silence.txt is OK

Checking /home/asr/neural_sp_assets/preprocessed_data/ami/local/dict/nonsilence_phones.txt ...
--> ERROR: /home/asr/neural_sp_assets/preprocessed_data/ami/local/dict/nonsilence_phones.txt is empty or not exists
hirofumi0810 commented 3 years ago

@agarwalchaitanya Could you try to comment out local/ami_prepare_dict.sh (line: 120) in run.sh?

agarwalchaitanya commented 3 years ago

@agarwalchaitanya Could you try to comment out local/ami_prepare_dict.sh (line: 120) in run.sh?

that helps remove the error but it fails somewhere within stage 0

============================================================================
                                  AMI
============================================================================
============================================================================
                       Data Preparation (stage:0)
============================================================================
+ dir=/home/asr/neural_sp_assets/preprocessed_data/ami/local/downloads
+ mkdir -p /home/asr/neural_sp_assets/preprocessed_data/ami/local/downloads
+ echo 'Downloading annotations...'
Downloading annotations...
+ amiurl=http://groups.inf.ed.ac.uk/ami
+ annotver=ami_public_manual_1.6.1
+ annot=/home/asr/neural_sp_assets/preprocessed_data/ami/local/downloads/ami_public_manual_1.6.1
+ logdir=/home/asr/neural_sp_assets/preprocessed_data/ami/local/downloads
+ mkdir -p /home/asr/neural_sp_assets/preprocessed_data/ami/local/downloads/log
+ '[' '!' -f /home/asr/neural_sp_assets/preprocessed_data/ami/local/downloads/ami_public_manual_1.6.1.zip ']'
+ '[' '!' -d /home/asr/neural_sp_assets/preprocessed_data/ami/local/downloads/annotations ']'
+ '[' '!' -f /home/asr/neural_sp_assets/preprocessed_data/ami/local/downloads/annotations/AMI-metadata.xml ']'
+ local/ami_xml2text.sh /home/asr/neural_sp_assets/preprocessed_data/ami/local/downloads
local/ami_xml2text.sh: line 19: [: openjdk version "11.0.10" 2021-01-19: integer expression expected
local/ami_xml2text.sh. Java not found. Will download exported version of transcripts.
--2021-02-11 17:16:05--  http://groups.inf.ed.ac.uk/ami/AMICorpusAnnotations/ami_manual_annotations_v1.6.1_export.gzip
Resolving groups.inf.ed.ac.uk (groups.inf.ed.ac.uk)... 129.215.202.26
Connecting to groups.inf.ed.ac.uk (groups.inf.ed.ac.uk)|129.215.202.26|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 3725858 (3.6M) [application/x-troff-man]
Saving to: '/home/asr/neural_sp_assets/preprocessed_data/ami/local/annotations/ami_manual_annotations_v1.6.1_export.gzip'

/home/asr/neural_sp_assets/pr 100%[=================================================>]   3.55M  2.49MB/s    in 1.4s

2021-02-11 17:16:07 (2.49 MB/s) - '/home/asr/neural_sp_assets/preprocessed_data/ami/local/annotations/ami_manual_annotations_v1.6.1_export.gzip' saved [3725858/3725858]

+ wdir=/home/asr/neural_sp_assets/preprocessed_data/ami/local/annotations
+ '[' '!' -f /home/asr/neural_sp_assets/preprocessed_data/ami/local/annotations/transcripts1 ']'
+ echo 'Preprocessing transcripts...'
Preprocessing transcripts...
+ local/ami_split_segments.pl /home/asr/neural_sp_assets/preprocessed_data/ami/local/annotations/transcripts1 /home/asr/neural_sp_assets/preprocessed_data/ami/local/annotations/transcripts2
+ for dset in train eval dev
+ grep -f local/split_train.orig /home/asr/neural_sp_assets/preprocessed_data/ami/local/annotations/transcripts2
+ for dset in train eval dev
+ grep -f local/split_eval.orig /home/asr/neural_sp_assets/preprocessed_data/ami/local/annotations/transcripts2
+ for dset in train eval dev
+ grep -f local/split_dev.orig /home/asr/neural_sp_assets/preprocessed_data/ami/local/annotations/transcripts2
sdm
In total, 0 files were found.
Warning: expected 169 data data files, found 0
Usage: utils/validate_data_dir.sh [--no-feats] [--no-text] [--non-print] [--no-wav] [--no-spk-sort] <data-dir>
The --no-xxx options mean that the script does not require
xxx.scp to be present, but it will check it if it is present.
--no-spk-sort means that the script does not require the utt2spk to be
sorted by the speaker-id in addition to being sorted by utterance-id.
--non-print ignore the presence of non-printable characters.
By default, utt2spk is expected to be sorted by both, which can be
achieved by making the speaker-id prefixes of the utterance-ids
e.g.: utils/validate_data_dir.sh data/train
AMI sdm1 data preparation succeeded.
In total, 0 files were found.
local/ami_sdm_scoring_data_prep.sh. Applying following fixes to segments
s/^AMI_IB4004_SDM_MIO039_0036179_0036400 AMI_IB4004_SDM 361.79 364$/AMI_IB4004_SDM_MIO039_0036179_0036400 AMI_IB4004_SDM 362.28 364/;
convert2stm: Recording-id AMI_ES2011a_SDM not defined in reco2file_and_channel file /home/asr/neural_sp_assets/preprocessed_data/ami/sdm1/dev_orig/reco2file_and_channel at local/convert2stm.pl line 70.
Usage: utils/validate_data_dir.sh [--no-feats] [--no-text] [--non-print] [--no-wav] [--no-spk-sort] <data-dir>
The --no-xxx options mean that the script does not require
xxx.scp to be present, but it will check it if it is present.
--no-spk-sort means that the script does not require the utt2spk to be
sorted by the speaker-id in addition to being sorted by utterance-id.
--non-print ignore the presence of non-printable characters.
By default, utt2spk is expected to be sorted by both, which can be
achieved by making the speaker-id prefixes of the utterance-ids
e.g.: utils/validate_data_dir.sh data/train
AMI sdm1 scenario and dev set data preparation succeeded.
In total, 0 files were found.
convert2stm: Recording-id AMI_EN2002a_SDM not defined in reco2file_and_channel file /home/asr/neural_sp_assets/preprocessed_data/ami/sdm1/eval_orig/reco2file_and_channel at local/convert2stm.pl line 70.
Usage: utils/validate_data_dir.sh [--no-feats] [--no-text] [--non-print] [--no-wav] [--no-spk-sort] <data-dir>
The --no-xxx options mean that the script does not require
xxx.scp to be present, but it will check it if it is present.
--no-spk-sort means that the script does not require the utt2spk to be
sorted by the speaker-id in addition to being sorted by utterance-id.
--non-print ignore the presence of non-printable characters.
By default, utt2spk is expected to be sorted by both, which can be
achieved by making the speaker-id prefixes of the utterance-ids
e.g.: utils/validate_data_dir.sh data/train
AMI sdm1 scenario and eval set data preparation succeeded.
utils/data/get_utt2dur.sh: segments file does not exist so getting durations from wave files
utils/data/get_utt2dur.sh: successfully obtained utterance lengths from sphere-file headers
utils/data/get_utt2dur.sh: computed /home/asr/neural_sp_assets/preprocessed_data/ami/sdm1/train_orig/utt2dur
utils/data/modify_speaker_info.sh: copied data from /home/asr/neural_sp_assets/preprocessed_data/ami/sdm1/train_orig to /home/asr/neural_sp_assets/preprocessed_data/ami/train_sdm1, number of speakers changed from 0 to 0
Usage: utils/validate_data_dir.sh [--no-feats] [--no-text] [--non-print] [--no-wav] [--no-spk-sort] <data-dir>
The --no-xxx options mean that the script does not require
xxx.scp to be present, but it will check it if it is present.
--no-spk-sort means that the script does not require the utt2spk to be
sorted by the speaker-id in addition to being sorted by utterance-id.
--non-print ignore the presence of non-printable characters.
By default, utt2spk is expected to be sorted by both, which can be
achieved by making the speaker-id prefixes of the utterance-ids
e.g.: utils/validate_data_dir.sh data/train