lifeiteng / vall-e

PyTorch implementation of VALL-E(Zero-Shot Text-To-Speech), Reproduced Demo https://lifeiteng.github.io/valle/index.html
https://lifeiteng.github.io/valle/index.html
Apache License 2.0
2.03k stars 319 forks source link

Tokenizer Errors On MLS Spanish Dataset #164

Closed hulsmeier closed 1 year ago

hulsmeier commented 1 year ago

I have run lhotse prepare on the MLS Spanish dataset and created the manifest file. I've also added a way to select MLS dataset partitions to the bin/tokenizer.py file (which works fine), but when I run on CUDA I get the following error: cuda error

If i try to run the tokenizer without cuda I also get the following assertion errors and the script stops running after only processing one partition: cpu error

Can you tell me what output I am supposed to see after running the tokenizer? I get a single folder with a bunch of h5 files that are all 800 bytes?