libretro / RetroArch

Cross-platform, sophisticated frontend for the libretro API. Licensed GPLv3.
http://www.libretro.com
GNU General Public License v3.0
10.07k stars 1.81k forks source link

[Feature Request] AI service improvements/fixes #10602

Open klepp0906 opened 4 years ago

klepp0906 commented 4 years ago

So I have a few issues here with the AI service. I didnt know how to package this so I apologize for the discombobulation. I didnt want to create a half a dozen separate issue reports.

  1. Untether the graphics widgets requirement from the image mode. I prefer the old text style notifications and image mode doesnt work unless widgets are enabled.

  2. When using speech mode with pause, change the timing so it pauses, then reads back the dialogue. at current it pauses and doesnt read dialogue until you unapause. This does you no good if you have dialogue that you cant control the rate of and youre trying not to fall behind/miss any. Even if you can control the rate, it renders the pause setting entirely useless in this mode.

  3. Enable some kind of perpetual mode for both image & speech. (I assume narrator mode is supposed to be the aforementioned for speech mode). For either mode, having to hit your hotkey over and over for every line of dialogue you run across in a foreign rpg will get incredibly tiresome incredibly fast. At current, narrator mode seems to do nothing at the moment.

I prefer to use image mode but cant A) because i use old style notifications and B) because i cant fathom having to hit my hotkey for every piece of dialogue in a game from beginning to end.

devinprater commented 4 years ago

Narrator mode doesn't do this, it just is more streamlined than speech mode, not saying "text box 1" and such. It also uses the system TTS rather than Google TTS and such. But you still have to hit the key bound to it. This would help in Narrator mode as well, not having to hit the key and it scanning every half second or so, because there are places when new text appears that I don't know about.

klepp0906 commented 4 years ago

ah, good to know. while that doesnt speak to why narrator mode isnt working and the others are (for me) at least we know a persistent functionality would benefit there too.

perhaps a setting to have the AI hotkey function as a one-time vs a toggle.

Or a companion-mode to the above three image-auto, speech-auto, narrator-auto .

like most things, i defer to the people magnitudes smarter than i.

devinprater commented 4 years ago

The only concern about that is that the AI Service only has a limited amount of calls per... Month I think? Around 20000. If I play Dissidia Final Fantasy for an hour, which I do a lot, that'd probably be over 3000 API calls per hour. But, not all systems have an offline OCR technology, like Windows 10's OCR, used by this NVDA add-on which was made for video captions, but is also very useful in games. Perhaps Tesseract could be used? But that's another dependency, which I doubt RA would want to include, and I don't know how it compares to ZTranslate.

So, that's probably why there isn't an Auto mode, because of API limits.

klepp0906 commented 4 years ago

should still be left up to the user right?

I'm aware of the limit, but thought it was 10,000. I was curious about that very thing and googled the number of lines of dialogue in a game and it varied wildly but a lot came in at under 20,000.

thats lines of dialogue. I have no idea how the api calls work but i assume one text box (which includes multiple lines of dialogue) is captured all at once, thus cutting down that number.

either way, i know in my case - im only getting so many hours of gaming a week. I doubt id come close to that limit and if i did - thats the price you pay to play as they say.

worst case you hit that api call limit and swap to a non foreign game until your limit refreshes - or if your that much of an import aficionado can always increase your call limit with $$ (i think?)

either way, options are good. players can make the determination thats right for them i imagine.

of course i have absolutely no idea how this would work on the implementation side. does it recognize a screen with text then capture it/use a call? does it fire off a call every xxx time period thus using a call whether text is on the screen or not?

there just has to be a better way than press, press, press, press, press. Especially when you play from bed or couch with a gamepad. that leaves you finding a bind etc. Not ideal is all.

so this narrator mode? its literally 1 press and it speaks the dialogue on screen back to you? exactly the same as speech mode but it omits some potential garbage text?

wonder why its not working. tried with the new graphical notifications on (just in case it cucked that too) and tried with wasapi instead of xaudio in case that was a thing.

unless it boils down to specific game or specific types of text can be captured via say image or speech mode but not narrator mode.

either way, it simply doesnt work for me. my ocd hates when things dont work : p

devinprater commented 4 years ago

I agree that it should be left up to the user.

As a blind person, I can't read any text in games, which is why the Narrator mode exists. It works alongside the Accessibility mode. It can't describe images, but blind people play games through sound anyway, so speaking text is sometimes all we need to enjoy a game. So this is pretty important, and I do support having an Automatic aI mode.

geekley commented 2 months ago

When it's not set to pause, why does the game lag for a while when you press the AI service hotkey? A screenshot + base64 conversion shouldn't be lagging the game. But it interrupts gameplay (and most importantly, the music), while it's processing. Is it also waiting for the network call? In any case, can't it run the entire process in a thread or something async so as to not interrupt the game at all?