Closed yuto-nozaki closed 11 months ago
It seems one of your audio is multi-channel (2 channels), which causes this issue. You may have to filter out such cuts from your cut set, if you want to use that feature extractor.
@desh2608 Thank you for answering! As you said, some of your audio files have multiple channels. After filtering out such cuts, I solved the problem. Thanks.
I am trying to run the following project (https://github.com/lifeiteng/vall-e/tree/main) using a dataset that we have prepared.
After modifying( https://github.com/lifeiteng/vall-e/blob/main/egs/libritts/prepare.sh ) for our dataset and running it, the following error occurred:
Upon inspection, it seems that the error is occurring in the function
compute_and_store_features_batch
located at https://github.com/lhotse-speech/lhotse/blob/db40bc4e8595c0c3c1a418da200848e58df5b1c8/lhotse/cut/set.py#L1968.As the root cause, it appears that the dimensions of the variable 'waves' created between lines https://github.com/lhotse-speech/lhotse/blob/db40bc4e8595c0c3c1a418da200848e58df5b1c8/lhotse/cut/set.py#L2127-L2150 seem to be incorrect.
Specifically, when I run:
I notice that there are a few instances with a dimension of 2, as shown below:
Is there anyone who knows the reason for this inconsistency about dimensions?
lhotse version: v1.17