pytorch / audio

Data manipulation and transformation for audio signal processing, powered by PyTorch
https://pytorch.org/audio
BSD 2-Clause "Simplified" License
2.43k stars 636 forks source link

Discern music from spoken word #2986

Open ifeatu opened 1 year ago

ifeatu commented 1 year ago

šŸš€ The feature

I'm wondering if there are any researchers out there that can search an audio stream like an mp3 and determine whether or not the track is purely spoken word versus a song or music? I can think of a number of potential techniques (such as phonetic search) that have varying levels of accuracy. Perhaps there are ffmpeg scripts out there that I might not be aware of.

Motivation, pitch

I am working on a project wherein I generate a folder of mp3 tracks, the tracks are either spoken word or music (never both) and I simply want to separate the music from the spoken word without having to listen to each track.

Alternatives

I don't believe there are any viable alternative solutions other than listening to each track

Additional context

I've done a bunch of research on phonetics and phonetic search specifically. I haven't been able to find any projects that focus on this feature specifically. A thought is being able to discern the presence a specific instrument (in almost every case, either a piano and/or drums are playing.)

I should specify that none of the music is under any licensing constraints. Nor is it any music that can be matched to existing fingerprints for known songs.

nateanl commented 1 year ago

Hi @ifeatu, I think voice activity detection (detecting human's voice in audio) is capable for your task. If the VAD model can detect speech from the audio, the audio can be classified as speech, otherwise it is music.