Closed is2022 closed 2 years ago
Did you make any changes to local/compute_fbank_aishell.py
?
The error log
ValueError: Cannot split iterable into more chunks (15) than its number of items 0
says that your cut_set is empty. Can you post the output print(len(cut_set))
?
You can put it just before the following line
https://github.com/k2-fsa/icefall/blob/87cf9231ea73631f1e4453400b3be06d45bcebf5/egs/aishell/ASR/local/compute_fbank_aishell.py#L78
2022-04-04 13:40:17 (prepare.sh:89:main) Stage 1: Prepare aishell manifest 2022-04-04 13:40:19 (prepare.sh💯main) Stage 2: Prepare musan manifest
Are you using the latest master? Also, the log shows that your Stage 1
took only 2 seconds, which is unexpected.
Can you show the output of
ls -lh data/manifests/
It should print something as follows:
-rw-r--r-- 1 kuangfangjun root 5.2M Mar 8 20:20 recordings_dev.json
-rw-r--r-- 1 kuangfangjun root 234K Mar 8 20:20 recordings_music.json
-rw-r--r-- 1 kuangfangjun root 341K Mar 8 20:20 recordings_noise.json
-rw-r--r-- 1 kuangfangjun root 155K Mar 8 20:20 recordings_speech.json
-rw-r--r-- 1 kuangfangjun root 2.6M Mar 8 20:20 recordings_test.json
-rw-r--r-- 1 kuangfangjun root 44M Mar 8 20:19 recordings_train.json
-rw-r--r-- 1 kuangfangjun root 4.2M Mar 8 20:20 supervisions_dev.json
-rw-r--r-- 1 kuangfangjun root 170K Mar 8 20:20 supervisions_music.json
-rw-r--r-- 1 kuangfangjun root 2.1M Mar 8 20:20 supervisions_test.json
-rw-r--r-- 1 kuangfangjun root 35M Mar 8 20:19 supervisions_train.json
Thank you very much for the quick reply. My data directory only has the download folder in it, which contains the downloaded musan and aishell folders, together with the lm. drwxr-xr-x. 3 root root 22 Apr 4 13:40 ./ drwxr-xr-x. 11 root root 330 Apr 4 13:40 ../ drwxr-xr-x. 3 root root 44 Apr 4 13:40 download/
How did you run the script ./prepare.sh
?
The log
2022-04-04 13:40:17 (prepare.sh:89:main) Stage 1: Prepare aishell manifest
2022-04-04 13:40:19 (prepare.sh💯main) Stage 2: Prepare musan manifest
shows that you have run Stage 1
and Stage 2
, which should have produced some manifests files in the folder data/
.
These are the commands in my run.sh. cd egs/aishell/ASR
export LC_ALL=C.UTF-8 export LANG=C.UTF-8
./prepare.sh ./conformer_ctc/train.py --num-epochs 10 ./conformer_ctc/decode.py --method 1best --max-duration 100
My data directory only has the download folder in it
There should be no download
directory inside data
. download
should be in the same folder as ./prepare.sh
.
Could you post the output of ls -lh ./download/*
? I suspect that you have not downloaded the data yet.
I put the data in dl_dir: /workspace/icefall/data/download output of "ll data/download/" drwxr-xr-x. 3 root root 44 Apr 4 13:40 ./ drwxr-xr-x. 3 root root 22 Apr 4 13:40 ../ lrwxrwxrwx. 1 root root 74 Apr 4 13:40 aishell drwxr-xr-x. 3 root root 68 Apr 4 13:40 lm/ lrwxrwxrwx. 1 root root 72 Apr 4 13:40 musan
I put the data in dl_dir: /workspace/icefall/data/download
Could you show the changes you made to prepare.sh
?
line 31: dl_dir=/workspace/icefall/data/download
Can you check that your stage 1 is finished successfully? https://github.com/k2-fsa/icefall/blob/87cf9231ea73631f1e4453400b3be06d45bcebf5/egs/aishell/ASR/prepare.sh#L87-L96
I have the following files in /workspace/icefall/egs/aishell/ASR/data/manifests May be the cd egs/aishell/ASR at te start of run.sh is messing things up?
drwxr-xr-x. 2 root root 4096 Apr 4 14:59 ./ drwxr-xr-x. 4 root root 36 Apr 4 14:59 ../ -rw-r--r--. 1 root root 0 Apr 4 14:55 .aishell_manifests.done -rw-r--r--. 1 root root 0 Apr 4 14:59 .musan_manifests.done -rw-r--r--. 1 root root 2 Apr 4 14:55 recordings_dev.json -rw-r--r--. 1 root root 209677 Apr 4 14:59 recordings_music.json -rw-r--r--. 1 root root 306733 Apr 4 14:59 recordings_noise.json -rw-r--r--. 1 root root 138869 Apr 4 14:59 recordings_speech.json -rw-r--r--. 1 root root 2 Apr 4 14:55 recordings_test.json -rw-r--r--. 1 root root 2 Apr 4 14:55 recordings_train.json -rw-r--r--. 1 root root 2 Apr 4 14:55 supervisions_dev.json -rw-r--r--. 1 root root 173904 Apr 4 14:59 supervisions_music.json -rw-r--r--. 1 root root 2 Apr 4 14:55 supervisions_test.json -rw-r--r--. 1 root root 2 Apr 4 14:55 supervisions_train.json
By default, the script is run using the following commands
cd egs/aishell/ASR
./prepare.sh
and it generates files in ./data
and ./download
. You can select a different directory for download
by changing dl_dir
in prepare.sh, but the folder for data
is fixed.
From the output of /workspace/icefall/egs/aishell/ASR/data/manifests
, it looks like everything goes as expected.
Do you still have the above errors?
Yes, same errors. By the way, print(len(cut_set)) prints 0.
Oh, wait.
-rw-r--r--. 1 root root 2 Apr 4 14:55 recordings_test.json
-rw-r--r--. 1 root root 2 Apr 4 14:55 recordings_train.json
-rw-r--r--. 1 root root 2 Apr 4 14:55 supervisions_test.json
-rw-r--r--. 1 root root 2 Apr 4 14:55 supervisions_train.json
this does not look right.
Can you check that /workspace/icefall/data/download
contains the folders described below?
https://github.com/k2-fsa/icefall/blob/87cf9231ea73631f1e4453400b3be06d45bcebf5/egs/aishell/ASR/prepare.sh#L13-L16
This is inside aishell drwxr-xr-x. 4 1682 500 62 Mar 30 17:03 ./ drwxr-xr-x. 4 1682 500 130 Mar 31 18:50 ../ drwxr-xr-x. 4 1682 500 47 Jun 16 2017 data_aishell/ drwxr-xr-x. 2 1682 500 57 Jun 21 2017 resource_aishell/ and this is inside musan drwxr-xr-x. 5 1682 500 80 Nov 16 2015 ./ drwxr-xr-x. 4 1682 500 130 Mar 31 18:50 ../ -rwxr-xr-x. 1 1682 500 1765 Oct 30 2015 README* drwxr-xr-x. 7 1682 500 128 Oct 30 2015 music/ drwxr-xr-x. 4 1682 500 73 Oct 30 2015 noise/ drwxr-xr-x. 4 1682 500 66 Oct 30 2015 speech/
The reason for the error is that you have empty manifest files for aishell, Your following files are actually empty.
-rw-r--r--. 1 root root 2 Apr 4 14:55 recordings_test.json
-rw-r--r--. 1 root root 2 Apr 4 14:55 recordings_train.json
-rw-r--r--. 1 root root 2 Apr 4 14:55 supervisions_test.json
-rw-r--r--. 1 root root 2 Apr 4 14:55 supervisions_train.json
You can step into the following code https://github.com/k2-fsa/icefall/blob/87cf9231ea73631f1e4453400b3be06d45bcebf5/egs/aishell/ASR/prepare.sh#L87-L96 to see what went wrong.
You have to delete data/manifests/.aishell_manifests.done
before going forward.
I even ran lhotse prepare aishell $dl_dir/aishell data/manifests
directly but the above files are still empty.
You can set breakpoints in https://github.com/lhotse-speech/lhotse/blob/master/lhotse/recipes/aishell.py#L72 to debug it.
For instance, change
corpus_dir = Path(corpus_dir)
assert corpus_dir.is_dir(), f"No such directory: {corpus_dir}"
to
import pdb
pdb.set_trace()
corpus_dir = Path(corpus_dir)
assert corpus_dir.is_dir(), f"No such directory: {corpus_dir}"
When you run lhotse prepare aishell $dl_dir/aishell data/manifests
, it will enter pdb
, you can try to find what is wrong.
The issue was that I had forgotten to unzip the tar files inside aishell/data_aishell/wav/ After that, the manifests got generated (with some warnings about some missing transcripts) and now it's computing fbank. Thanks a lot for your help! :)
Sorry, my bad. I encountered such an issue before, and fixed it here (https://github.com/lhotse-speech/lhotse/pull/388). But I forgot to change the if condition in prepare.sh. see https://github.com/k2-fsa/icefall/pull/291.
Hi, while reproducing the Aishell egs, I get the following error. Any ideas what am I doing wrong? Thanks
2022-04-04 13:40:17 (prepare.sh:58:main) stage 0: Download data 2022-04-04 13:40:17 (prepare.sh:89:main) Stage 1: Prepare aishell manifest 2022-04-04 13:40:19 (prepare.sh:100:main) Stage 2: Prepare musan manifest 2022-04-04 13:40:19 (prepare.sh:104:main) It may take 6 minutes 2022-04-04 13:41:34,837 WARNING [qa.py:116] There are 15 recordings that do not have any corresponding supervisions in the SupervisionSet. 2022-04-04 13:43:15 (prepare.sh:112:main) Stage 3: Compute fbank for aishell 2022-04-04 13:43:17,119 INFO [compute_fbank_aishell.py:67] Processing train Traceback (most recent call last): File "./local/compute_fbank_aishell.py", line 111, in
compute_fbank_aishell(num_mel_bins=args.num_mel_bins)
File "./local/compute_fbank_aishell.py", line 86, in compute_fbank_aishell
storage_type=LilcomHdf5Writer,
File "/usr/local/lib/python3.6/dist-packages/lhotse/cut.py", line 4483, in compute_and_store_features
cut_sets = self.split(num_jobs, shuffle=True)
File "/usr/local/lib/python3.6/dist-packages/lhotse/cut.py", line 3635, in split
self, num_splits=num_splits, shuffle=shuffle, drop_last=drop_last
File "/usr/local/lib/python3.6/dist-packages/lhotse/utils.py", line 334, in split_sequence
f"Cannot split iterable into more chunks ({num_splits}) than its number of items {num_items}"
ValueError: Cannot split iterable into more chunks (15) than its number of items 0