hibobmaster / matrix-stt-bot

A simple matrix bot that transcribes your voice to text message
https://matrix.to/#/#public:matrix.qqs.tw
MIT License
20 stars 3 forks source link

Try and decrypt any audio media_type instead of just ogg #3

Closed remoremorali closed 10 months ago

remoremorali commented 10 months ago

m4a is a showing up in encrypted rooms, so I thought to try and decode anything that start with audio/ for future compatibility.

hibobmaster commented 10 months ago

The reason why I hardcode audio/ogg is because message recorded by element client is audio/ogg. Besides audio/ogg is rarely used in other scenarios. If we add formats(m4a,wav,etc) to the list, bot will transcribe it as well which we may not want it to do so when it is music or something else.

How do you think?

remoremorali commented 10 months ago

I did some other tests and it seems that the difference that I see depends on the source of the audio message: whatsapp sends ogg while fluffychat for matrix sends mp4s. Also, when sending encrypted messages, fluffychat sends an application/octet-stream and event.mimetype contains the correct audio/mp4 value. And the name of the file starts with "recording" in the case of encrypted messages sent from matrix. I'll udpate my patch accordingly

hibobmaster commented 10 months ago

Thank you for your contribution.