open-source-ideas / ideas

💡 Looking for inspiration for your next open source project? Or perhaps you've got a brilliant idea you can't wait to share with others? Open Source Ideas is a community built specifically for this! 👋
6.54k stars 223 forks source link

Open Source auto-bleeper for podcasts and songs #381

Open ppeters0502 opened 3 months ago

ppeters0502 commented 3 months ago

Project description

Right now my kids are at the age where they start repeating anything they hear, including swear words. Since there are already tons of services for transcribing (relatively accurately) audio into text, I thought it might be useful to use various speech to text services and popular media player APIs to automatically "bleep" out swear words for a given audio stream or file.

Relevant Technology

There are tons of different Speech to Text services that (with a subscription or sometimes per-audio-file fee) can generate a transcript of an audio file and return a JSON file or TXT file with timestamps and the transcribed words.

For "bleeping" the naughty words, we could take two different approaches. If the media player streaming the audio supports downloading the audio file and storing locally, we could use an audio signal processor like FFmpeg to edit the audio file directly and replace the audio of the naughty words with generated signals at the exact timestamps, effectively "bleeping" out the words.

I think it would take more work as far as piping in audio to a Speech to Text service for media streamers like Spotify, but for those media players that don't support downloading the file locally (or like Spotify they encrypt local downloads), We can take a different approach. If they have an accessible API for controlling playback volume (for this example, Spotify does), you could programmatically make calls to the media streamer's API to get the current playback volume, update the playback volume to zero for the duration of the bleeped word, and then reset the playback volume to the original value after the duration of the swear word has passed.

Complexity and required time

Complexity

Required time (ETA)

I think the initial idea with one media player and one speech to text service wouldn't take a large amount of time. Ideally I think this project would work best if the website/extension/web app supported multiple types of media players and multiple speech to text services, which would take a considerable amount of time.

Categories

KaKi87 commented 3 months ago

You're having a so similar XY problem to the previous OP that I'm wondering if you're one and the same 😅

Say you install an app on your kids' devices that bleeps everything everywhere.

What's gonna happen when they listen to music with friends on their devices ? Nothing.

This is an education problem, that shouldn't be attempted to be solved with technical means.

ppeters0502 commented 3 months ago

While my use case focused on my children (mostly listening to Spotify and podcasts in the car), I was more thinking this was a niche (but still relevant) sort of issue that doesn't seem to have an open-source option. I could potentially see use cases for listening with kids, for listening to music in a corporate setting, or just for people who don't like swear words (haha, like my mother in law!) Especially since Speech to Text and AI-adjacent projects are picking up steam, I thought this sort of project could be a good starting point, and (depending on interest and development) could have the potential to pick up in several different types of technology, like a browser extension, web app, or mobile app.

If there's no interest in this project I can close the issue, I just felt like your response on kids' devices (which I totally understand by the way!) is only focusing on one specific use case, when I could see this project possibly fitting multiple scenarios.

a4v2d4 commented 3 weeks ago

@ppeters0502 I'm interested, I think your use case is actually pretty common.

For text-to-speech, whisper.cpp (https://github.com/ggerganov/whisper.cpp) could be used for free.

Not sure how we could get intercept the audio stream from something like Spotify even if we can mute the volume. I guess we could find the podcast from separate source, and use the same detected timestamps for mute?