ml-explore / mlx-data

Efficient framework-agnostic data loading
MIT License
362 stars 40 forks source link

features.audio.mfsc returns empty arrays #10

Closed SarthakYadav closed 10 months ago

SarthakYadav commented 10 months ago

I just tried out the spectrogram feature extraction pipeline as seen here.

However, when running the following minimal example based on the provided sample code:

from mlx.data.datasets import load_librispeech
from mlx.data.features import mfsc

dset = (
    load_librispeech()
    .key_transform("audio", mfsc(80, 16000))
    .to_stream()
    .prefetch(16, 8)
    .batch(1)
    .prefetch(2, 1)
)

batch = next(dset)
print(batch['audio'].shape)        # prints (1, <x>, 0, 80)
print(batch['audio'])                   # prints []

I get shape as (1, <x>, 0, 80) and an empty array. I'm looking into it.

angeloskath commented 10 months ago

Hm, yes you need to squeeze the librispeech audio before passing to mfsc. I am thinking whether it is a load_librispeech bug or a documentation bug. Either way I will push a fix shortly but for now you can do the following

from mlx.data.datasets import load_librispeech
from mlx.data.features import mfsc

dset = (
    load_librispeech()
    .squeeze("audio")  # <---------- being the only change
    .key_transform("audio", mfsc(80, 16000))
    .to_stream()
    .prefetch(16, 8)
    .batch(1)
    .prefetch(2, 1)
)
SarthakYadav commented 10 months ago

Haha exactly what I figured out. Yeah the inbuilt load_audio returns a (t, 1) array, so maybe just updating the docs would be fine. Leaving this open for you to close once you've made the changes.

angeloskath commented 10 months ago

Precisely, I will leave it open until I update the docs later today.