Closed SarthakYadav closed 10 months ago
Hm, yes you need to squeeze
the librispeech audio before passing to mfsc
. I am thinking whether it is a load_librispeech
bug or a documentation bug
. Either way I will push a fix shortly but for now you can do the following
from mlx.data.datasets import load_librispeech
from mlx.data.features import mfsc
dset = (
load_librispeech()
.squeeze("audio") # <---------- being the only change
.key_transform("audio", mfsc(80, 16000))
.to_stream()
.prefetch(16, 8)
.batch(1)
.prefetch(2, 1)
)
Haha exactly what I figured out. Yeah the inbuilt load_audio returns a (t, 1) array, so maybe just updating the docs would be fine. Leaving this open for you to close once you've made the changes.
Precisely, I will leave it open until I update the docs later today.
I just tried out the spectrogram feature extraction pipeline as seen here.
However, when running the following minimal example based on the provided sample code:
I get shape as
(1, <x>, 0, 80)
and an empty array. I'm looking into it.