k2-fsa / sherpa-onnx

Speech-to-text, text-to-speech, speaker recognition, and VAD using next-gen Kaldi with onnxruntime without Internet connection. Support embedded systems, Android, iOS, Raspberry Pi, RISC-V, x86_64 servers, websocket server/client, C/C++, Python, Kotlin, C#, Go, NodeJS, Java, Swift, Dart, JavaScript, Flutter, Object Pascal, Lazarus, Rust
https://k2-fsa.github.io/sherpa/onnx/index.html
Apache License 2.0
3.25k stars 380 forks source link

https://paste.crdroid.net/F1kHLh #539

Open mablue opened 8 months ago

mablue commented 8 months ago

https://paste.crdroid.net/F1kHLh

csukuangfj commented 8 months ago

Are there more contexts? Could you describe how you get the above logs?

mablue commented 8 months ago

yes I just installed 'next gen kaldi faris haniye' apk as tts to my android phone and I get these crashes. its not usable like a normal tts. it will be run at first time and if I clear recent apps it will not work I will open again the next gen to work its very buggy and tts motor settings always crashes in all phones I tested many users have these like problems. its not say keyboard clicks while I use screenreader in gboard and cant speech english words inside persian texts also many devices have to wait more than 20 sec to start speeching of haniyeh! problems are more than one but the biggest thing is these crashes and silence while pressing keyboard keys for blinds

csukuangfj commented 8 months ago

if I clear recent apps it will not work I will open again the next gen to work its very buggy

@mablue It should be fixed by #553

Please redownload the v1.9.9 APK once the github actions finishes building it.

mablue commented 8 months ago

still som errors while the app is actually open in recents https://paste.crdroid.net/85dEOC

also long texts is the biggest problem:

read these examples in telegram with screenreaders:

منی وجود نداره ما همه یه موجودیم که دست و پاهای همیم. یاد شدن از من مثل یاد کردن از سنگیه غلطید وسط جاده و باعث نشد ماشینای بیشتری برن روی پلی که از شدت زمین لرزه شکسته بود...
وقتی باور کنی که این تو فقط به خاطر شناسنامه و خطاب دیگران و توهمات خودته که تویی چیزی از تو باقی نمیمونه جز اتمهای بر باد غلطانی که عضو یه ساختار بسیار عظیم هستن به نام هستی...

کوهی نباش که با صد کیلو دینامیت از جاش تکون نمیخوره...کاهی باش که با برخورد خودش به کوهی تلنگر ریز ولی نافذی میزنه.

بلاخره اوضامون هرچی باشه مطمینا بهتر از هلن کلره...به شرطی که یه لحظه مشکلات خودمونو فراموش کنیم و به بقیه فکر کنیم.

Edited:

I think problem is from commentary lite screenreader. but with talkback i think its fixed. https://github.com/nirenr/jieshuo

csukuangfj commented 8 months ago

I think you should put punctuation symbols inside your text, though I am not sure whether there are punctuation symbols in Persian at all.

mablue commented 8 months ago

there is some people in telegram that them dont care about them selves punctuations of texts. we fully use screenreader for read them texts. its just an example. many other messages that have emojies are problematic. but its not the problem of this tts engine its all about community screenreader. cuz if its in free version it will not work fine with this tts engine and if we use cracked version it have viruses amd trojans and if we use paid version 32$ is a very high money in iran 🫃🏼 we should learn talkback cuz I tested this engine with talkback screenreader of google. its working very good with talkback. its not engine problem...

modded(with some trojans ☠️) https://blindhelp.net/software/csr

free version(problematic) https://github.com/nirenr/jieshuo

talkback https://play.google.com/store/apps/details?id=com.google.android.marvin.talkback

csukuangfj commented 8 months ago

The current sherpa-onnx tts engine is using a non-streaming vits model.

Non-streaming means you have to input all of the text for synthesis.

If your text contains punctuation symbols, then it is split into sentences, and each sentence is synthesized independently.

If there are no punctuations, then all of the text is treated as a single sentence, which means it will take much longer to synthesize it.

The shorter the sentence, the faster it is.

mablue commented 8 months ago

oh thanks. Now I undrestand you. we have these punctuations in persian. maybe them not appended in app: ؟!؛:«»﷼٫٪. ، …

About Exclamation mark: In both languages, the exclamation mark character is composed of two characters: a period and an exclamation point. In English, this character is represented by the ASCII code 33, and in Persian, it is represented by the Unicode code U+061F. However, these code differences are only at the technical level, and there is no difference in the appearance of the character.🫃🏼 I think girhub cant progress it it will be smaller than english one! like this marek see its big(؟!) also we have latency (5~15sec)in reading contact names in whatsapp them are very small characters I think for progressing!!🫃🏿 but cuz of that we just read the screen with screenreaders we cant add manually punctuations to text but if the app have a checkmark to append a punctuation to first work it can be helpfull to us. also still we searching a way to play english words mixed in persian texts...