lifeiteng / vall-e

PyTorch implementation of VALL-E(Zero-Shot Text-To-Speech), Reproduced Demo https://lifeiteng.github.io/valle/index.html
https://lifeiteng.github.io/valle/index.html
Apache License 2.0
2.04k stars 320 forks source link

librilight: When running tokenizer on the data, I get OOM on CUDA and even on 256GB of my server ram... #134

Closed RuntimeRacer closed 1 year ago

RuntimeRacer commented 1 year ago

So I considered training VALL-E using the librilight dataset. After downloading, I processed similar to LJSpeech:

lhotse prepare librilight -j 16 download data/manifests

However, when I try to do Tokenization, I instantly hit OOM, no matter which batch duration I use. This is how I call the tokenizer:

python3 bin/tokenizer.py --dataset-parts "small medium large" --audio-extractor Encodec --batch-duration 400 --src-dir "data/manifests" --output-dir "data/tokenized" --prefix "librilight"

I suspect there is either some issue with the data format, or the tokenizer tries to store the whole dataset in memory for some reason.

Did anyone encounter this, too, and knows about a potential fix ?

RuntimeRacer commented 1 year ago

Nevermind, I figured I need to do segmentation first: https://github.com/facebookresearch/libri-light/blob/main/data_preparation/README.md#1b-segmenting

caffeinetoomuch commented 1 year ago

How are you getting the transcriptions for recordings?

JunjieLl commented 9 months ago

How are you getting the transcriptions for recordings?

Maybe Libriheavy is a good choice

JunjieLl commented 9 months ago

So I considered training VALL-E using the librilight dataset. After downloading, I processed similar to LJSpeech:

lhotse prepare librilight -j 16 download data/manifests

However, when I try to do Tokenization, I instantly hit OOM, no matter which batch duration I use. This is how I call the tokenizer:

python3 bin/tokenizer.py --dataset-parts "small medium large" --audio-extractor Encodec --batch-duration 400 --src-dir "data/manifests" --output-dir "data/tokenized" --prefix "librilight"

I suspect there is either some issue with the data format, or the tokenizer tries to store the whole dataset in memory for some reason.

Did anyone encounter this, too, and knows about a potential fix ?

Sorry to disturb. Actually I'm reproducing valle with Libriheavy, an alternative of LibriLight with text labels. This dataset has provided lhotse manifest for segmented audios, but I'm still encountering OOM(RAM) error. Is there any details I didn't notice.

mubtasimahasan commented 1 month ago

@JunjieLl Hi, I'm facing the same issue, were you able to find a solution for the OOM (RAM) error? Any tips would be appreciated. Thanks!