ibmtjbot / tjbot

IBM TJBot
https://ibmtjbot.github.io
Apache License 2.0
481 stars 282 forks source link

Remove support for camera due to Visual Recognition API not being available anymore #180

Closed robertoetcheverryr closed 1 year ago

robertoetcheverryr commented 2 years ago

As the title says, May I make a PR removing the config.js option for the camera or at least adding a comment that it's not available anymore? And the same with the ibm-credentials.env file? Maybe delete the switch case regarding the camera?

jweisz commented 2 years ago

@robertoetcheverryr I originally decided to leave it in just in case someone had an already-provisioned visual recognition service, but the likelihood of this keeps going down over time. That said, I'm not fully convinced we should remove the camera support, as it's a huge part of what makes TJBot fun! But, it's not clear to me what the other options might be. Are there any vision models we might be able to run locally on the Pi?

robertoetcheverryr commented 2 years ago

I haven't used it yet, but Google has a Vision API with 1000 monthly queries. I'm not sure if that works with the free account or with the "you must put your CC and only then you get the free tier"...

jweisz commented 2 years ago

We would not able to officially support the use of Google APIs with TJBot.

andycitron commented 2 years ago

I do have an already-provisioned visual recognition service provisioned, but it seems to have stopped working. I'm having trouble figuring out why. The reply tjbot returns says: Error: Error An error occurred while processing your request.

Reference #30.713a2f17.1639419281.1a1a60f6 Do I have any hope of working around this?

jweisz commented 2 years ago

Likely not. There's no support for the Watson Visual Recognition service anymore as it's been discontinued, and it seems I really should go ahead with removing it from the TJBot library. That said, I haven't yet found a viable replacement, since I'd hate for TJBot to lose the ability to see. 😢

I'm definitely open to someone submitting a PR to replace Visual Recognition with something else. Preferably something on-device, though that might up the hardware requirements...

andycitron commented 2 years ago

Justin, I reimplemented my tjbot code using Microsoft Azure visual services. I uploaded the code fragments to github in case you were interested in incorporating it into your tjbot node js implementation. You can find it here: https://github.com/andycitron/tjBotFragmentThatUsesAzureVisualServices

Note that it does introduce a dependency that the user has a Microsoft Azure account.

Also, if you want to incorporate it into tj.see(), you'd want to structure it a bit differently. tj.see() takes a photo. I did not want my Microsoft functions to have a dependency on tj.takePhoto. So I put the 'take a photo' part into the intent processing for 'see' and passed the photo into the code that uses Microsoft functions.

The code I put out there includes additional methods that invoke Microsoft facial recognition. That is not part of the 'tj.see()' functionality. That code requires pre-training of the facial recognition models. I included that because it might be useful to someone.

jweisz commented 2 years ago

Hey @andycitron, happy new year. :)

Thanks for the effort you put into TJBot, this is a really great contribution. Unfortunately, I won't be able to make this part of the official library because it uses a competitor's cloud service. But, I will put this on our Featured Recipes page to showcase your work.

andycitron commented 2 years ago

Cool. Yesterday I posted a video to Youtube illustrating how TJ works with Azure. Perhaps you want to include the 4 minute video along with the featured recipe: https://youtu.be/B92efwFqXSs

Could you give me the link to the Featured Recipes page where you referenced my code? I'd like to include a link to that on my home page. thx.

jweisz commented 2 years ago

Here's the link: https://github.com/ibmtjbot/tjbot/tree/master/featured#microsoft-azure-visual-services-by-andycitron

andycitron commented 2 years ago

Justin, Sorry to bother you, but I can’t figure out where to ask this question. It’s not a tjbot issue, just a question.

Where does tjbot store the audio file it uses for speech to text? What format?

I see that Microsoft Azure has ‘person voice recognition’ and I’m thinking about trying that out. Seems it wants a wav audio file as input.

My tjbot gets confused when multiple speakers are talking at the same time. Those utterances usually end up being ignored by my implementation. But every once in a while it’ll try to respond. Because I know who I’m talking to (facial recognition) I can ignore utterances from a different person….or at least I can try.

Do you know the answer? Or can you tell me the proper place to ask this.

I actually think an implementation using multiple microphones and detecting voice based on location in the room makes sense, but that seems very hard.

Thx, Andy

From: Justin Weisz @.*** Sent: Tuesday, January 4, 2022 10:40 AM To: ibmtjbot/tjbot Cc: andycitron; Mention Subject: Re: [ibmtjbot/tjbot] Remove support for camera due to Visual Recognition API not being available anymore (#180)

Hey @andycitronhttps://github.com/andycitron, happy new year. :)

Thanks for the effort you put into TJBot, this is a really great contribution. Unfortunately, I won't be able to make this part of the official library because it uses a competitor's cloud service. But, I will put this on our Featured Recipes page to showcase your work.

— Reply to this email directly, view it on GitHubhttps://github.com/ibmtjbot/tjbot/issues/180#issuecomment-1004915870, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AIG4X2MDFOYAVO4Q4IXE52LUUMIFLANCNFSM5FWRME3A. Triage notifications on the go with GitHub Mobile for iOShttps://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Androidhttps://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub. You are receiving this because you were mentioned.Message ID: @.***>

jweisz commented 2 years ago

Hi @andycitron -- take a look at tjbot.js:792, where listen() is defined: https://github.com/ibmtjbot/tjbotlib/blob/4fe0263bd0050f910752ae589d3b33cdb9cb93ae/src/tjbot.js#L792

The audio isn't stored locally, the data is streamed through a pipe between the microphone and a web socket. So it would need some modification to save the output to a file first, before uploading to the Microsoft service. Maybe check to see if their API supports WebSockets?

jweisz commented 1 year ago

Closing as this is now an issue in the tjbotlib repo: https://github.com/ibmtjbot/tjbotlib/issues/73