syt0r / Kanji-Dojo

A multi-platform application for memorizing Japanese language
GNU General Public License v3.0
237 stars 7 forks source link

Add Voice Prompts (for Kana, at least) #27

Closed OkyDooky closed 8 months ago

OkyDooky commented 1 year ago

When I first started learning kana, I used an app that had voice recordings to go with the characters and it helped immensely with learning and memorizing. There is another app on F-Droid that also does this: Fun With Kanji Perhaps they might be willing to point you to where they got their samples from. I can also spend some time searching for free-to-use (and redistribute) sample packs. At worst, I could maybe open something on CastingCall.Club or elsewhere. Lol

Regarding the inevitable concern over file size, voice packs could be optional and downloaded through the app from Github or elsewhere. They could even be sparated into different packages for different levels (eg one for kana, one for N5, etc.). For the files themselves, Ogg-Opus is a God-level codec and these samples could be reduced to low double-digit kilobytes each without losing much (if any) nuance in pronunciation (that first app I used had some samples that were hard to discern between).

syt0r commented 1 year ago

Hey, thanks for suggestion. There is already another issue regarding sound #11, so I'm closing this one.

I had a look at Fun With Kanji's repository and tried the application, they are using text-to-speach and I don't really like how it sounds. As an idea there might be free AI based voice generators to create high quality voice lines, need to investigate

In any case this feature sounds a bit complicated and only covers kana, which is small part compared to huge amounts of kanji available in the app, so I'm not very inclined into investing time in it for now

OkyDooky commented 1 year ago

Thanks for thanking me. Haha. I'll head over to that issue then (I didn't notice it before opening mine).

Ah. Maybe that's why it's not opening on my main device (no default TTS engine, let alone Japanese). On my other device, it was using Google's Speech Services and sounded fine for the first couple entries I tried. But, I understand your concern. Plus, like with my main device (which uses LineageOS), it doesn't have that engine, so you wouldn't be able to control for quality or service availability by using TTS.

Sourcing by using a professional TTS service could be an option. I wonder if NHK's stuff has a permissive license or not. Hmm... Regardless, I think that there are easy was to get voices for the kana, at least, but they just need to be found. Since you have enough to focus on, maybe you could redirect any other users asking for this to the other issue, where I'll suggest that users could put in that effort to find or provide audio sample packs. (e.g. "I'd be willing to add them, if you can link to a good source or provide sufficiently high quality samples yourself")

I think it would be worth to add, even for just kana, since learners, like myself, absolutely need that to get their feet underneath them for learning the rest of the Japanese language. If you can read kana in "Japanese," instead of just translating it to romaji in your head, then you can more immersively/natively learn kanji, I think. But, I'me not sure how to plan for adding sound packs to kanji... Like, would you need to do them for all of the ones you provide at once? Or would it be okay to only offer chunks at a time and do it incrementally? Yeah, I can see why that would not be an inviting challenge for the average person...kind of like kanji. Haha.

Oh, and an update to my claim about it not significantly affecting file size for the app...I lied (sort of). At 32kbps (which is very much in the acceptable quality range for speech in Ogg/Opus), individual sounds and words do not get as low as double-digit kilobytes...they average around 5-9kBs, according to some personal recording tests I did with Simple Voice Recorder. Lol. So, adding all kana sounds should, in theory, do next to nothing to your app size, if you did add them. I could also record them myself, but I'm not native, so... yeh.

Alright, I'll post some of this in the other thread and we'll see what happens, down the road.

syt0r commented 1 year ago

@OkyDooky Hm, I'm using Google Cloud for analytics service hosting and seems like they offer text to speech service with free tier and I still have 300$ trial funds for one month, Neural2 samples on their website sound very nice https://cloud.google.com/text-to-speech/docs/voices

Although it's not clear how it will voice single characters and I couldn't find any terms of use for generated sounds, but it 'sounds' 😅 very promising

syt0r commented 1 year ago

I did a test, very impressive, though near the end AI becomes French or something))

synthesis (1).zip

OkyDooky commented 1 year ago

Lol, you're right. Very impressive (sounded like there was some decent amount of hiss, though), until the... I don't know what happened at the end there. That's a really good find! I'm assuming it provides an option for female voices, as well?

You mentioned $300 trial fund. So, that gets you one month? Are there any restrictions on like amount of generated samples or anything?

syt0r commented 1 year ago

New Google Cloud customers get free funds for several months. They are not very clear with limitations on usage of generated content, but somewhere on the Internet I saw in a discussion that it's probably ok to use it unless generated audio is used to replicate google's tts. I've generated a new record by attaching "ー" to all characters, it fixed the issue for several last characters with dakuten, so it's good to go. I have 3 audio files with different voices, I'm going to use only 1 for now, currently splitting it to multiple tracks with Audacity. Archived kana should take approximately 155kb using worst quality vorbis ogg, sounds good enough for me and since it's not too big it probably ok to ship sounds within apk

OkyDooky commented 1 year ago

I would recommend looking into compressing with Ogg-Opus, instead of Vorbis, since Opus was originally designed for speech and then merged with a generic audio encoder (the format, apparently, switches between the two dynamically in the same file to get the best optimization). Plus, you have more exact control over the bitrate. However, I looked into exporting to Opus with Audacity and it requires installing FFmpeg libraries and then linking them in Audacity. So, whatever you feel like doing.

That's good to hear about the licensing and about how you fixed the wonkiness. It's a shame that Google's TTS is so restrictive, since it's quite convenient to record its output and use it for stuff.

Thank you for looking into this. Having this feature will really put it in a league above many other free and freemium apps out there (especially FOSS ones). It also makes me happy. 😄

I'll definitely be recommending this one to anybody I know who's interested in learning Japanese.

syt0r commented 9 months ago

Good news, I finally started to work on this feature :sweat_smile:

And it seems google added even better voices since the last time I've checked, I already labeled kana sounds, here are the result files

OkyDooky commented 8 months ago

syt0r: good-news-everyone

I took a listen and they sound great! Thanks for going ahead with this. I was even thinking of checking in about it, recently. So, the timing is great. 😄