I'm wondering if there are any researchers out there that can search an audio stream like an mp3 and determine whether or not the track is purely spoken word versus a song or music? I can think of a number of potential techniques (such as phonetic search) that have varying levels of accuracy. Perhaps there are ffmpeg scripts out there that I might not be aware of.
Motivation, pitch
I am working on a project wherein I generate a folder of mp3 tracks, the tracks are either spoken word or music (never both) and I simply want to separate the music from the spoken word without having to listen to each track.
Alternatives
I don't believe there are any viable alternative solutions other than listening to each track
Additional context
I've done a bunch of research on phonetics and phonetic search specifically. I haven't been able to find any projects that focus on this feature specifically. A thought is being able to discern the presence a specific instrument (in almost every case, either a piano and/or drums are playing.)
I should specify that none of the music is under any licensing constraints. Nor is it any music that can be matched to existing fingerprints for known songs.
Hi @ifeatu, I think voice activity detection (detecting human's voice in audio) is capable for your task. If the VAD model can detect speech from the audio, the audio can be classified as speech, otherwise it is music.
š The feature
I'm wondering if there are any researchers out there that can search an audio stream like an mp3 and determine whether or not the track is purely spoken word versus a song or music? I can think of a number of potential techniques (such as phonetic search) that have varying levels of accuracy. Perhaps there are ffmpeg scripts out there that I might not be aware of.
Motivation, pitch
I am working on a project wherein I generate a folder of mp3 tracks, the tracks are either spoken word or music (never both) and I simply want to separate the music from the spoken word without having to listen to each track.
Alternatives
I don't believe there are any viable alternative solutions other than listening to each track
Additional context
I've done a bunch of research on phonetics and phonetic search specifically. I haven't been able to find any projects that focus on this feature specifically. A thought is being able to discern the presence a specific instrument (in almost every case, either a piano and/or drums are playing.)
I should specify that none of the music is under any licensing constraints. Nor is it any music that can be matched to existing fingerprints for known songs.