mlcommons / croissant

Croissant is a high-level format for machine learning datasets that brings together four rich layers.
https://mlcommons.org/croissant
Apache License 2.0
448 stars 40 forks source link

Handle audio FileObjects/FileSets in Croissant #240

Open marcenacp opened 1 year ago

marcenacp commented 1 year ago

Proposal:

We propose to handle audio features using https://schema.org/AudioObject.

Technical strategy:

This can be split in several PRs.

monke6942021 commented 1 year ago

I think we should look into adding support for https://schema.org/VideoObject and plain binary files at some point too.

fineguy commented 1 year ago

I had a look at some audio libraries, here are my thoughts. In short: I'm in favor of using librosa.

Libraries overview

Things in common:

librosa:

sounddevice:

pydub:

soundfile:

audioread:

Conclusion

It looks to me that librosa and pydub are the two most used Python libraries for audio processing. pydub was last released in 2021 while librosa has been steadily updated. Given that librosa also has a better documentation, I'd recommend using it.

fineguy commented 1 year ago

I also had a look at the most popular audio datasets from Hugging Face and Papers With Code. They all use either FLAC or WAV audio formats. The only exception is Common Voice which uses MP3.

monke6942021 commented 11 months ago

Hey, I notice that in #242 , one of the attributes that we look into is the bitrate. What do we do if there are multiple bitrates, due to there being multiple mp3 files?