Open Source auto-bleeper for podcasts and songs

ppeters0502 commented 3 months ago

Project description

Right now my kids are at the age where they start repeating anything they hear, including swear words. Since there are already tons of services for transcribing (relatively accurately) audio into text, I thought it might be useful to use various speech to text services and popular media player APIs to automatically "bleep" out swear words for a given audio stream or file.

Relevant Technology

There are tons of different Speech to Text services that (with a subscription or sometimes per-audio-file fee) can generate a transcript of an audio file and return a JSON file or TXT file with timestamps and the transcribed words.

For "bleeping" the naughty words, we could take two different approaches. If the media player streaming the audio supports downloading the audio file and storing locally, we could use an audio signal processor like FFmpeg to edit the audio file directly and replace the audio of the naughty words with generated signals at the exact timestamps, effectively "bleeping" out the words.

I think it would take more work as far as piping in audio to a Speech to Text service for media streamers like Spotify, but for those media players that don't support downloading the file locally (or like Spotify they encrypt local downloads), We can take a different approach. If they have an accessible API for controlling playback volume (for this example, Spotify does), you could programmatically make calls to the media streamer's API to get the current playback volume, update the playback volume to zero for the duration of the bleeped word, and then reset the playback volume to the original value after the duration of the swear word has passed.

Complexity and required time

Complexity

[ ] Beginner - This project requires no or little prior knowledge of the technolog(y|ies) specified to contribute to the project
[x] Intermediate - The user should have some prior knowledge of the technolog(y|ies) to the point where they know how to use it, but not necessarily all the nooks and crannies of the technology
[ ] Advanced - The project requires the user to have a good understanding of all components of the project to contribute

Required time (ETA)

[ ] Little work - A couple of days
[ ] Medium work - A week or two
[x] Much work - The project will take more than a couple of weeks and serious planning is required

I think the initial idea with one media player and one speech to text service wouldn't take a large amount of time. Ideally I think this project would work best if the website/extension/web app supported multiple types of media players and multiple speech to text services, which would take a considerable amount of time.

KaKi87 commented 3 months ago

You're having a so similar XY problem to the previous OP that I'm wondering if you're one and the same 😅

Say you install an app on your kids' devices that bleeps everything everywhere.

What's gonna happen when they listen to music with friends on their devices ? Nothing.

This is an education problem, that shouldn't be attempted to be solved with technical means.

ppeters0502 commented 3 months ago

While my use case focused on my children (mostly listening to Spotify and podcasts in the car), I was more thinking this was a niche (but still relevant) sort of issue that doesn't seem to have an open-source option. I could potentially see use cases for listening with kids, for listening to music in a corporate setting, or just for people who don't like swear words (haha, like my mother in law!) Especially since Speech to Text and AI-adjacent projects are picking up steam, I thought this sort of project could be a good starting point, and (depending on interest and development) could have the potential to pick up in several different types of technology, like a browser extension, web app, or mobile app.

If there's no interest in this project I can close the issue, I just felt like your response on kids' devices (which I totally understand by the way!) is only focusing on one specific use case, when I could see this project possibly fitting multiple scenarios.

a4v2d4 commented 3 weeks ago

@ppeters0502 I'm interested, I think your use case is actually pretty common.

For text-to-speech, whisper.cpp (https://github.com/ggerganov/whisper.cpp) could be used for free.

Not sure how we could get intercept the audio stream from something like Spotify even if we can mute the volume. I guess we could find the podcast from separate source, and use the same detected timestamps for mute?

open-source-ideas / ideas