Open SaltedSlark opened 1 year ago
I’m guessing this is related to IPC of data loading workers for batch feat computation and could be related to too many workers/too large batches; but judging by the warning about max_duration, did you trim your cut set to supervisions? Can you show the output “lhotse cut describe cuts.jsonl.gz”? I think you might be computing features for very long cuts (and you probably don’t need this).
I’m guessing this is related to IPC of data loading workers for batch feat computation and could be related to too many workers/too large batches; but judging by the warning about max_duration, did you trim your cut set to supervisions? Can you show the output “lhotse cut describe cuts.jsonl.gz”? I think you might be computing features for very long cuts (and you probably don’t need this).
Thanks for ur reply! I revised the num_workers to 0, and this happened:
/bin/bash: /home/zj/anaconda3/envs/vall-e/lib/libtinfo.so.6: no version information available (required by /bin/bash)
2023-08-30 10:26:50 (prepare.sh:59:main) Stage 1: Prepare wenetspeech manifest
2023-08-30 10:26:50 (prepare.sh:71:main) Stage 2: Tokenize/Fbank wenetspeech
2023-08-30 10:27:06,501 INFO [tokenizer.py:160] dataset_parts: ['S'] manifests {'S': {'recordings': RecordingSet(len=43664), 'supervisions': SupervisionSet(len=151600)}}
2023-08-30 10:27:06,507 INFO [tokenizer.py:167] Processing partition: S CUDA: True
Computing features in batches: 0%| | 0/43664 [00:00<?, ?it/s]/home/zj/workspace/TTS/lhotse/lhotse/dataset/sampling/simple.py:216: UserWarning: The first cut drawn in batch collection violates the max_frames, max_cuts, or max_duration constraints - we'll return it anyway. Consider increasing max_frames/max_cuts/max_duration.
warnings.warn(
Computing features in batches: 0%| | 0/43664 [00:14<?, ?it/s]
Traceback (most recent call last):
File "/home/zj/workspace/TTS/vall-e/egs/wenetspeech/bin/tokenizer.py", line 268, in <module>
main()
File "/home/zj/workspace/TTS/vall-e/egs/wenetspeech/bin/tokenizer.py", line 204, in main
cut_set = cut_set.compute_and_store_features_batch(
File "/home/zj/workspace/TTS/lhotse/lhotse/cut/set.py", line 2308, in compute_and_store_features_batch
features = extractor.extract_batch(
File "/home/zj/workspace/TTS/vall-e/valle/data/tokenizer.py", line 348, in extract_batch
encoded_frames = self.tokenizer.encode(samples.detach().to(device))
File "/home/zj/workspace/TTS/vall-e/valle/data/tokenizer.py", line 239, in encode
return self.codec.encode(wav.to(self.device))
File "/home/zj/anaconda3/envs/vall-e/lib/python3.10/site-packages/encodec/model.py", line 144, in encode
encoded_frames.append(self._encode_frame(frame))
File "/home/zj/anaconda3/envs/vall-e/lib/python3.10/site-packages/encodec/model.py", line 161, in _encode_frame
emb = self.encoder(x)
File "/home/zj/anaconda3/envs/vall-e/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/home/zj/anaconda3/envs/vall-e/lib/python3.10/site-packages/encodec/modules/seanet.py", line 144, in forward
return self.model(x)
File "/home/zj/anaconda3/envs/vall-e/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/home/zj/anaconda3/envs/vall-e/lib/python3.10/site-packages/torch/nn/modules/container.py", line 204, in forward
input = module(input)
File "/home/zj/anaconda3/envs/vall-e/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/home/zj/anaconda3/envs/vall-e/lib/python3.10/site-packages/encodec/modules/seanet.py", line 63, in forward
return self.shortcut(x) + self.block(x)
File "/home/zj/anaconda3/envs/vall-e/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/home/zj/anaconda3/envs/vall-e/lib/python3.10/site-packages/torch/nn/modules/container.py", line 204, in forward
input = module(input)
File "/home/zj/anaconda3/envs/vall-e/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/home/zj/anaconda3/envs/vall-e/lib/python3.10/site-packages/encodec/modules/conv.py", line 204, in forward
x = pad1d(x, (padding_total, extra_padding), mode=self.pad_mode)
File "/home/zj/anaconda3/envs/vall-e/lib/python3.10/site-packages/encodec/modules/conv.py", line 92, in pad1d
padded = F.pad(x, paddings, mode, value)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 7.14 GiB (GPU 0; 23.65 GiB total capacity; 21.73 GiB already allocated; 104.06 MiB free; 21.73 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
does it mean that the recodingset or Supervisionset is too long for my gpu devices(RTX 4090 24GB)? and what should i do to avoid this?
Try cuts = cuts.trim_to_supervisions() before feature extraction and then you can also use multiple workers again.
Try cuts = cuts.trim_to_supervisions() before feature extraction and then you can also use multiple workers again.
thanks! like this? before: after:
Yeah
Yeah
thanks! I met another problem when I try to train my vall-e model on S subset: I have no idea what is wrong, looking for your rely, much love!
Looks like not every training example has features extracted. Make sure you passed the path to the right cut set (with features). You can also check ‘lhotse cut describe
Looks like not every training example has features extracted. Make sure you passed the path to the right cut set (with features). You can also check ‘lhotse cut describe ’ it will show you some stats about the data. okay, and here is the status of my cut_train.jsonl.gz looks like features num is much smaller than cuts count? is that something wrong?and why it happend?
Looks like not every training example has features extracted. Make sure you passed the path to the right cut set (with features). You can also check ‘lhotse cut describe ’ it will show you some stats about the data. okay, and here is the status of my cut_train.jsonl.gz looks like features num is much smaller than cuts count? is that something wrong?and why it happend? I combine two sets to get the cut_train set and I found one of them has 0 feature...
Silence is over 90%??
On Fri, Sep 1, 2023, 11:15 AM ZhangJiang @.***> wrote:
Looks like not every training example has features extracted. Make sure you passed the path to the right cut set (with features). You can also check ‘lhotse cut describe ’ it will show you some stats about the data. okay, and here is the status of my cut_train.jsonl.gz [image: image] https://user-images.githubusercontent.com/32287808/264909381-b549ca50-76fa-4259-bec8-7c886e7a2e73.png
— Reply to this email directly, view it on GitHub https://github.com/lhotse-speech/lhotse/issues/1132#issuecomment-1702095120, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAZFLOZC7TDRB56EJKTSG6DXYFHNHANCNFSM6AAAAAA4COGQNU . You are receiving this because you are subscribed to this thread.Message ID: @.***>
Silence is over 90%?? … On Fri, Sep 1, 2023, 11:15 AM ZhangJiang @.> wrote: Looks like not every training example has features extracted. Make sure you passed the path to the right cut set (with features). You can also check ‘lhotse cut describe ’ it will show you some stats about the data. okay, and here is the status of my cut_train.jsonl.gz [image: image] https://user-images.githubusercontent.com/32287808/264909381-b549ca50-76fa-4259-bec8-7c886e7a2e73.png — Reply to this email directly, view it on GitHub <#1132 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAZFLOZC7TDRB56EJKTSG6DXYFHNHANCNFSM6AAAAAA4COGQNU . You are receiving this because you are subscribed to this thread.Message ID: @.>
... looks so weird ..., and I don't know what's wrong.
Look at the jsonl file
On Friday, September 1, 2023, ZhangJiang @.***> wrote:
Silence is over 90%?? … <#m-4835813782995112893> On Fri, Sep 1, 2023, 11:15 AM ZhangJiang @.> wrote: Looks like not every training example has features extracted. Make sure you passed the path to the right cut set (with features). You can also check ‘lhotse cut describe ’ it will show you some stats about the data. okay, and here is the status of my cut_train.jsonl.gz [image: image] https://user-images.githubusercontent.com/32287808/264909381-b549ca50-76fa-4259-bec8-7c886e7a2e73.png https://user-images.githubusercontent.com/32287808/264909381-b549ca50-76fa-4259-bec8-7c886e7a2e73.png — Reply to this email directly, view it on GitHub <#1132 (comment) https://github.com/lhotse-speech/lhotse/issues/1132#issuecomment-1702095120>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAZFLOZC7TDRB56EJKTSG6DXYFHNHANCNFSM6AAAAAA4COGQNU https://github.com/notifications/unsubscribe-auth/AAZFLOZC7TDRB56EJKTSG6DXYFHNHANCNFSM6AAAAAA4COGQNU . You are receiving this because you are subscribed to this thread.Message ID: @.>
... looks so weird ..., and I don't know what's wrong.
— Reply to this email directly, view it on GitHub https://github.com/lhotse-speech/lhotse/issues/1132#issuecomment-1702108347, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAZFLO4EBN2UXG7FLFQBVCDXYFJVTANCNFSM6AAAAAA4COGQNU . You are receiving this because you commented.Message ID: @.***>
looks like features num is much smaller than cuts count? is that something wrong?and why it happend? I combine two sets to get the cut_train set and I found one of them has 0 feature...
Perhaps one of the cut sets you combined did not have features computed. Also, judging by the mean duration of 1600s, you did not call .trim_to_supervisions()
on this cutset.
thanks you so much!@pzelasko @danpovey I'll try.
@pzelasko As for M subset, I am sure that I've called .trim_to_supervisions
as I showed. I found the Supervisions available does not match with Feature available...
and it seems to cause an validate mistake after call validate()
@pzelasko As for M subset, I am sure that I've called
.trim_to_supervisions
as I showed. I found the Supervisions available does not match with Feature available... and it seems to cause an validate mistake after callvalidate()
Detailed description in this function mentioned that keep_overlapping would keep the number matched.
Result on S subset:
You either need to use keep_overlapping=False or filter out the cuts that have overlapping speech (whichever makes sense for your use case).
@SaltedSlark Hi, how long did you take preprocessing WenetSpeech M set? It takes me 50 minutes extracting features, but it has taken over 11 hours saving to wenetspeech_cuts_M.jsonl.gz
and still not finished yet.
@pzelasko Is there any parallelization optimization for this function?
I tried to preprocess WenetSpeech M set last night, and it took over 11 hours on this function and still not finished(The progress bar time cost is 50 minutes before keyboard interrupt). I have successfully preprocessed WenetSpeech S set twice with same num_workers and the time for saving is negligible, so I guess this is not a lock issue.
By applying htop
, I find that only one CPU is used for saving.
@SaltedSlark Hi, how long did you take preprocessing WenetSpeech M set? It takes me 50 minutes extracting features, but it has taken over 8 hours saving to
wenetspeech_cuts_M.jsonl.gz
and still not finished yet.@pzelasko Is there any parallelization optimization for this function? I tried to preprocess WenetSpeech M set last night, and it took over 8 hours on this function and still not finished. I have successfully preprocessed WenetSpeech S set twice with same num_workers, so I guess this is not a lock issue.
For me, it took about 80hours to process M subset... and I also want to know how to speed up!
@SaltedSlark Hi, how long did you take preprocessing WenetSpeech M set? It takes me 50 minutes extracting features, but it has taken over 8 hours saving to
wenetspeech_cuts_M.jsonl.gz
and still not finished yet. @pzelasko Is there any parallelization optimization for this function? I tried to preprocess WenetSpeech M set last night, and it took over 8 hours on this function and still not finished. I have successfully preprocessed WenetSpeech S set twice with same num_workers, so I guess this is not a lock issue.For me, it took about 80hours to process M subset... and I also want to know how to speed up!
I'll try again.
I noticed that only one thread is set to save data from here. I tried to use 32 threads but it still cannot finish saving. @pzelasko
By separating recordings and annotations in manifest into small sets, I successfully generate wenetspeech_cuts_M_{i}.jsonl.gz(i=0~9)
within an hour. Since recordings and supervisions is saved sequentially, it won't take too long time to match them. @SaltedSlark
I noticed that only one thread is set to save data from here. I tried to use 32 threads but it still cannot finish saving. @pzelasko
By separating recordings and annotations in manifest into small sets, I successfully generate
wenetspeech_cuts_M_{i}.jsonl.gz(i=0~9)
within an hour. Since recordings and supervisions is saved sequentially, it won't take too long time to match them. @SaltedSlark
Thanks! But I don't know how to separate recodings and supervisions in manifest, need your help, bro.
I noticed that only one thread is set to save data from here. I tried to use 32 threads but it still cannot finish saving. @pzelasko By separating recordings and annotations in manifest into small sets, I successfully generate
wenetspeech_cuts_M_{i}.jsonl.gz(i=0~9)
within an hour. Since recordings and supervisions is saved sequentially, it won't take too long time to match them. @SaltedSlarkThanks! But I don't know how to separate recodings and supervisions in manifest, need your help, bro.
manifests = read_manifests_if_cached(
dataset_parts=dataset_parts,
output_dir=args.src_dir,
prefix=args.prefix,
suffix=args.suffix,
types=["recordings", "supervisions", "cuts"],
)
if args.prefix == "wenetspeech" and ("M" in manifests.keys() or "L" in manifests.keys()):
from lhotse.audio import RecordingSet
from lhotse.supervision import SupervisionSet
separate_num = 10 if "M" in manifests.keys() else 100
name = "M" if "M" in manifests.keys() else "L"
origin_manifest = manifests.pop(name)
recordings = [r for r in origin_manifest["recordings"]]
supervisions = [s for s in origin_manifest["supervisions"]]
start_idx = 0
for i in tqdm(range(separate_num)):
subset_name = name+str(i)
end_idx = len(recordings)*(i+1)//separate_num
cur_recordings = recordings[start_idx:end_idx]
cur_supervisions = []
for r in cur_recordings:
match = True
while match:
if len(supervisions)>0 and supervisions[0].recording_id == r.id:
cur_supervisions.append(supervisions.pop(0))
else:
match = False
manifests[subset_name] = {
"recordings": RecordingSet.from_recordings(cur_recordings),
"supervisions": SupervisionSet.from_segments(cur_supervisions)
}
start_idx = end_idx
assert len(supervisions) == 0
Some tips:
parts = cuts.split(num_parts)
, e.g.:
In [4]: cuts
Out[4]: CutSet(len=1519) [underlying data type: <class 'lhotse.lazy.LazyManifestIterator'>]
In [8]: cuts.split(2) Out[8]: [CutSet(len=760) [underlying data type: <class 'dict'>], CutSet(len=759) [underlying data type: <class 'dict'>]]
- `cuts.compute_and_store_features_batch` is bottlenecked by I/O in 99% of the use cases since feature extraction is usually much quicker than dataloading. Try to set the highest possible `batch_duration` first, and then keep increasing `num_workers` until you start seeing crashes, freezes, or slowdowns.
- if you're computing features on CPUs or have multiple GPUs, it's generally a good idea to split a single large cut set into parts as was suggested earlier and run multiple scripts processing these parts in parallel; for CPU based computation generally prefer `compute_and_store_features` though as it supports in-built parallelization across CPUs (unlike the batch version)
So is there possible to use on the fly in the function compute_and_store_features_batch
?
I didn’t get your question, please elaborate.
Sorry for my imcompleted asking. So my question is whether we can on-the-fly calculate the feature and not store them during the training process? Because in my case, I don't have such large GPU for the training.
Yes, you can compute the features inside the PyTorch dataset class. See OnTheFlyFeatures or K2SpeechRecognitionDataset for some examples. You can also look up k2-fsa/icefall repo for recipes that support this.
That's great. I will try to revise it. Thanks a lot.
insufficient shm insufficent disk mem? here is my docker info: