shabados / presenter

Desktop app for presenting the Shabad OS Database on projectors, TVs, and live streams
https://shabados.com
MIT License
19 stars 15 forks source link

Shazam-like Voice Detection AI for line being spoken/sung #391

Open bhajneet opened 4 years ago

bhajneet commented 4 years ago

I don't think this is easily possible, but some users are requesting we use the mics in phones/laptops/tablets to "hear" what is being sung and automatically suggest it to the user or if it's super accurate then automatically present it.

Any thoughts / ideas on how to achieve this? I have marked this as low (would be nice to have) and on hold (not planning to work on this) unless we have some better ideas of how it can be done and how long it would potentially take. Currently there is no roadmap in my head for achieving this.

Harjot1Singh commented 4 years ago

It might be possible for hukamname, since the clarity of words is much higher, but you still need speech-to-text for punjabi to function. I will say though, if we can get good clarity on some syllables in each word, that could actually be effective.

Requirements: 1) Some sort of way of searching for a mixture of things in a word. So if the line is har har har gun gaavao, need a way of searching for things - maybe hr h r gn gaao is heard by the speech-to-text, and so if we can feed that in to get a match (basically, a search that is "any letters of each word", a bit like first word)

2) Some sort of STT for gurmukhi

but if you'd like to even try making this work for kirtan, we could potentially run some ML models on existing kirtan (so you can map many ways of singing a shabad to each shabad itself), and then classify sound and see what it matches. The pros are that this could be effective for singing, but downside is that Shabads that haven't been sung before in our training dataset (I imagine there will be many) will not be classifable/detectable (in any instance, without training).

BUT... here's an idea. SOS opt-in to mic. That way, we could attempt to use Shabad OS input (of sound + whatever is being shown on the projector) to automatically train/improve out dataset. This sounds like something that could be very interesting, but potentially out of mainstream scope for now.

preetcharan commented 4 years ago

@Harjot1Singh @bhajneet wow just stumbled across this conversation when making my slack messages read. My uncle is a very wealthy businessman and wanted to give me money to get this feature done. I said I work with you guys and i don't think it's so practical, like I know when the tune of har raam naam jap laha starts on the vaja i already got the shabad up, and there are so many similar shabads with the 2nd half different...technically have no idea how to achieve it as you know i'm not in coding, AI or speech recognition, BUT from an audio perspective if you could make shabad os listen to a usb interface taken in for the broadcast, you should have a clear enough sound there to decipher.

This a very interesting project and if you do know of someone who we could pay to get this done then I could potentially get funds for this.

bhajneet commented 4 years ago

Depends on https://github.com/ShabadOS/gurmukhi-utils/issues/22

bhajneet commented 4 years ago

BUT... here's an idea. SOS opt-in to mic. That way, we could attempt to use Shabad OS input (of sound + whatever is being shown on the projector) to automatically train/improve out dataset. This sounds like something that could be very interesting, but potentially out of mainstream scope for now.

Though interesting, it seems a bit too much like snooping/breach of privacy. Also unnecessary with the vast amount of keertan/kirtan audio files/videos readily available.