lifeiteng / vall-e

PyTorch implementation of VALL-E(Zero-Shot Text-To-Speech), Reproduced Demo https://lifeiteng.github.io/valle/index.html
https://lifeiteng.github.io/valle/index.html
Apache License 2.0
1.99k stars 320 forks source link

tokenizer.py error #145

Closed tom1997 closed 1 year ago

tom1997 commented 1 year ago

I tried to trained on aishell4, after prepare the dataset manifests I run python3 bin/tokenizer.py --dataset-parts "train_L train_S train_M test" --text-extractor "pypinyin_initials_finals" --audio-extractor Encodec --batch-duration 400 --prefix aishell4 --src-dir ../../../dataset/aishell4/manifects/ --output-dir data/tokenized And I got the following log Computing features in batches: 0%| | 0/30 [00:00<?, ?it/s]0it [00:00, ?it/s] Traceback (most recent call last): File "bin/tokenizer.py", line 263, in main() File "bin/tokenizer.py", line 227, in main for c in tqdm(cut_set): File "/zhangyingxian/miniconda3/envs/valle/lib/python3.8/site-packages/tqdm/std.py", line 1178, in iter for obj in iterable: TypeError: 'NoneType' object is not iterable

I already re-install the least lhotse version, how could I solve it?

sherryxie1 commented 1 year ago
manifests = read_manifests_if_cached(
    dataset_parts=dataset_parts,
    output_dir=args.src_dir,
    prefix=args.prefix,
    suffix=args.suffix,
    types=["recordings", "supervisions", "cuts"],
) 

maybe you can check your manifests first

tom1997 commented 1 year ago
manifests = read_manifests_if_cached(
    dataset_parts=dataset_parts,
    output_dir=args.src_dir,
    prefix=args.prefix,
    suffix=args.suffix,
    types=["recordings", "supervisions", "cuts"],
) 

maybe you can check your manifests first

ok, I just print it

{'train': {'recordings': CutSet(len=0) [underlying data type: <class 'dict'>], 'supervisions': CutSet(len=0) [underlying data type: <class 'dict'>]}, 'dev': {'recordings': CutSet(len=0) [underlying data type: <class 'dict'>], 'supervisions': CutSet(len=0) [underlying data type: <class 'dict'>]}, 'test': {'recordings': CutSet(len=0) [underlying data type: <class 'dict'>], 'supervisions': CutSet(len=0) [underlying data type: <class 'dict'>]}

I just test aishell1, and here is the manifests

-rw-r--r-- 1 root root  54 Jul 13 03:29 aishell_recordings_dev.jsonl.gz
-rw-r--r-- 1 root root  55 Jul 13 03:29 aishell_recordings_test.jsonl.gz
-rw-r--r-- 1 root root  56 Jul 13 03:29 aishell_recordings_train.jsonl.gz
-rw-r--r-- 1 root root  56 Jul 13 03:29 aishell_supervisions_dev.jsonl.gz
-rw-r--r-- 1 root root  57 Jul 13 03:29 aishell_supervisions_test.jsonl.gz
-rw-r--r-- 1 root root  58 Jul 13 03:29 aishell_supervisions_train.jsonl.gz

CutSet is null. And I try re-generate again by prepare.sh, it remain the same.

tom1997 commented 1 year ago

I solved it by setting the right path of dataset, refer to lhotse

liuchaofei commented 1 year ago

I solved it by setting the right path of dataset, refer to lhotse

can you explain it more detail

tom1997 commented 1 year ago

You can see lhotse readme file, and make sure the prepare.sh path match to the manifests file.