keiffster / program-y

Python 3.x based AIML 2.0 Chatbot interpreter, framework, related programs and knowledge files
https://keiffster.github.io/program-y/
Other
349 stars 137 forks source link

2018 Product Backlog #127

Closed keiffster closed 5 years ago

keiffster commented 6 years ago

I've started pulling together the product backlog for 2018, some are things I want to add myself, other things are requests from users and other things are just things which a decent virtual agent needs. Please let me know if there is anything on the list that's important to you, or if anything is missing. So, in no particular order:

I'm going to keep this thread open to allow people to post their comments and suggestions. I'll then migrate everything that makes it into the backlog onto the project Kanban board

HCIS2020 commented 6 years ago

Online learning web client for end user to train the bot, and generate corresponding AIML file

ohoachuck commented 6 years ago

+1 for online learning web for end users. O.

Envoyé de mon iPhone

Le 30 janv. 2018 à 09:29, BennyShang notifications@github.com a écrit :

Online learning web client for end user to train the bot, and generate corresponding AIML file

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

keiffster commented 6 years ago

Was thinking of splitting the project structure. Keep the core bot and a cut down version of Y-Bot, all other bots split into their own Github repos, ( stop you getting 30MB of professor files !!! )

I was also thinking of splitting out extensions into their own repos to, so that energy, teleco, banking etc can grow ( or wilt ) on their own

Have been thinking about this for a while since making it available on PyPi

HCIS2020 commented 6 years ago

This bot program is ok for free talk and Q&A service. But for business flow scenario, I think we should work more on botflow part.

We need an intuitive tool to design a business service flow. For example:

Set Bot ID->Welcome Text->Service category hint->Service entrance discovery->Enter specific scenario service->Ask question for collecting input parameters(support all kind of TYPE and using sets and maps)->Decision making confirm->call oob system to get results->feedback to user->Exit specific scenario service->promotion text->bye text

This kind of chatbot service has certain commercial value.
Hope to improve botflow part in 2018.

HCIS2020 commented 6 years ago

Second part is online learning web for end user and conversation analyse tool. In the botflow part. How the end user discovery the service entrance.? How end user to response robot question for collection input parameters? We need analyze end user conversation to figure out and optimize botflow AIML.

seghcder commented 6 years ago

Have been busy with new update. Some thoughts :-)

Maybe outside Program-Y somewhat but maybe as extensions or ?

keiffster commented 6 years ago

Thanks Sean, some really great ideas. I am going to move all the ideas into individual cards on the Kanban board and set up some sort of voting/priority selection

Watch this space for more details on commercial support !!!

KlausGPaul commented 6 years ago

I did work on mattermost integration and (for that purpose) markdown formatted replies (pictures, tables). Not nice, just as a quick and dirty PoC — I can definitely share my experience with that and some ideas how to bolt it on.

Klaus

keiffster commented 6 years ago

Project splitting is now complete for now, all bots have been moved into their own projects and primary delivery mechanism for Programy moved to PyPi. In terms of some of the requests, comments inline below.

seghcder commented 6 years ago

SW> Extending error checking - including error checks within AIML files alongside statements SW2> Should say tests and not errors. Rather than including unit tests in separate files, I was thinking whether they could be included in AIML file below the AIML they are testing. We could do it now via XML comment eg , and having a test preprocessor extract those into a .tests file before we run the test runner. Might be better though if there were AIML tags to support this.

SW> Logging reports which AIML file a match was originally located in SW2> Understood re memory. Might be better as a "debug" option? It's sometimes challenging to find which statement was hit in what file when several public AIML sources are including (Rosie etc). This might also be required though to hot-reload a single changed AIML file? (then we get into potential issues with determinism based on file loading order).

SW> Native AD support SW2> Fair enough. pyad requires pywin32 and its had some issues with installation in the past too.

SW> Continued performance improvement... SW2> Hmm, not sure now re specifics. I think at some point there'll be a need for multiprocessing (since at present Program-Y is limited to a single processor), and threading may need some more attention re conversation ordering/queuing. Multi-protocol bot issues come in here too :-)

SW2> Eg, we have a case now where a user has to "set focus" on a given ticket before doing commands on that server. If the set command takes too long, its possible the second command gets executed before the first (since Skype is asynchronous and just launches a new ask_question in a new thread on each request received). Anyway, I think for most cases for now its fine. If a bot gets popular in an enterprise, it may start hitting these limitations (there are also workarounds too).

SW>Pycharm (or ?) AIML file validator SW2> We are also on pyCharm, and perhaps it is spoiling us :-). Already validating XML format. An XSD (XMI?) might be one option to validate AIML validity. Not sure how extensible pyCharm is to allow one to continuously load the AIML into a lightweight version of your aiml loader and highlight errors. Anyway this is probably lower priority (might be a good separate tool if AIML gets more standardised too :-) ).

Nothing that urgent above.

We seem to get more questions from potential internal customers around NLP and comments like "but AIML isn't NLP" (arguable?). I also point out that Mitsuku uses AIML so we've got a long way to go to hit the limits of AIML when it comes to conversational bots. Still, we aim to progress this further this year. One native Python library that looks interesting is https://spacy.io/ ... but Watson and Luis are also in play.

In any case I am thinking there will still be both AIML and NLP (rather than either/or) , along the lines of...

                     /--- NLP ---\
Client --> router --+             +---> AIML interface grammar -> sraix -> Python extensions/services
                     \--- AIML---/

What is the router? I think the easiest to start is to try AIML first, and if no matches pass through to NLP. If matched, it returns an AIML interface sentence rather than creating a second interface path to the Python extensions/libraries. This is what we are now doing with a fuzzy search FAQ bot POC using whoosh. Still early days though :-)

The other area that's getting some attention is voice interaction (on premise again). Is WebRTC a potential future "client?"

greecehalf commented 6 years ago

Session management, which assigns a unique id for every single user when different users chatting with chatbot.

keiffster commented 6 years ago

@greecehalf Session management is already baked into the platform however it is different depending upon the client

Console - Single user only client - The client id is always 'console' REST - You specify the userid as a parameter in the REST call Webchat - A cookie is written back to the browser and becomes the user id Social Clients - Twitter, Facebook, Kik, Line, Viber, Slack etc - userid is already present as part of the platform and therefore the client uses this one

I am currently working on shared identity which would allow you link your userids from different platforms so that conversational state is maintained across all clients... work in progress

greecehalf commented 6 years ago

Thank you for your information Keith. Actually I mean, say there are two persons are talking to the same chatbot, surely that the contents will be different. How could I assign an id number to each person so that the chatbot can remember different information when talking to different people?

Regards, Tianren Wang 发件人: Keith Sterling [mailto:notifications@github.com] 发送时间: 2018年5月8日 16:19 收件人: keiffster/program-y program-y@noreply.github.com 抄送: #WANG WANG1122@e.ntu.edu.sg; Mention mention@noreply.github.com 主题: Re: [keiffster/program-y] 2018 Product Backlog (#127)

@greecehalfhttps://github.com/greecehalf Session management is already baked into the platform however it is different depending upon the client

Console - Single user only client - The client id is always 'console' REST - You specify the userid as a parameter in the REST call Webchat - A cookie is written back to the browser and becomes the user id Social Clients - Twitter, Facebook, Kik, Line, Viber, Slack etc - userid is already present as part of the platform and therefore the client uses this one

I am currently working on shared identity which would allow you link your userids from different platforms so that conversational state is maintained across all clients... work in progress

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/keiffster/program-y/issues/127#issuecomment-387323386, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AkQ9lqBU5OIVlSiCv_vfwj-vyyHdQbsxks5twVTbgaJpZM4RwjhN.

keiffster commented 6 years ago

@ideasean - Comments in line

SW> Extending error checking - including error checks within AIML files alongside statements SW2> Should say tests and not errors. Rather than including unit tests in separate files, I was thinking whether they could be included in AIML file below the AIML they are testing. We could do it now via XML comment eg , and having a test preprocessor extract those into a .tests file before we run the test runner. Might be better though if there were AIML tags to support this.

KS> Preferences is to keep the tests seperate files. This follows the pattern of unit testing of most other languages, whereby the tests are kept seperate. It keeps the size of the aiml files down as a single grammar could have a huge number of variants when you include patterns matching, sets, bots, properties etc

KS> I find using the same directory structure for AIML and AIML tests means that the tests reflect the aiml files but then don't end up getting shipped into prod

SW> Logging reports which AIML file a match was originally located in SW2> Understood re memory. Might be better as a "debug" option? It's sometimes challenging to find which statement was hit in what file when several public AIML sources are including (Rosie etc). This might also be required though to hot-reload a single changed AIML file? (then we get into potential issues with determinism based on file loading order).

KS> I'll take a look at storing the file, just had a quick look and storing a has of the file, and keep the file in a single list keeps memory limit down. Only storing the file once, and then a single integer as the file reference might work. This also has added benefits of being able to quickly list all files loaded.

KS> Hot reloading is unlikley to support single AIML file, due to the significant change in the parse graph this could cause. I am close to finishing hot loading for all AIML files, single RDF files, single sets, all sets, single maps, all maps, all properties, all defaults, pattern nodes, template nodes and secuirty files

SW> Native AD support SW2> Fair enough. pyad requires pywin32 and its had some issues with installation in the past too.

KS> Going to need an Active Directory installation and I have a single Windows laptop in my current pile of machines. Might have to farm this out to some one who lives and breathes the microsoft ecosystem

SW> Continued performance improvement... SW2> Hmm, not sure now re specifics. I think at some point there'll be a need for multiprocessing (since at present Program-Y is limited to a single processor), and threading may need some more attention re conversation ordering/queuing. Multi-protocol bot issues come in here too :-)

KS> You can get some very nice performance increases by a combination of splitting your bot into multi bots, and then having them all point to a REST version. Unfortunately the really big performance increases come by replacing Flask with Sanic which uses asyncio and means multiple processors and threads come for free.

KS> Version 2 is inherently thread safe with all of the logic moved into client_context for handling state so its easier to port to other ( more windows friendly ) implementations of asyncio libraries

SW2> Eg, we have a case now where a user has to "set focus" on a given ticket before doing commands on that server. If the set command takes too long, its possible the second command gets executed before the first (since Skype is asynchronous and just launches a new ask_question in a new thread on each request received). Anyway, I think for most cases for now its fine. If a bot gets popular in an enterprise, it may start hitting these limitations (there are also workarounds too).

KS> Would message queue work here. I'm not familair with Skype for Biz, I've been working with the Microsoft Chatbot framework which integrates with Skype natively but not come across the async nature

SW>Pycharm (or ?) AIML file validator SW2> We are also on pyCharm, and perhaps it is spoiling us :-). Already validating XML format. An XSD (XMI?) might be one option to validate AIML validity. Not sure how extensible pyCharm is to allow one to continuously load the AIML into a lightweight version of your aiml loader and highlight errors. Anyway this is probably lower priority (might be a good separate tool if AIML gets more standardised too :-) ).

Nothing that urgent above.

We seem to get more questions from potential internal customers around NLP and comments like "but AIML isn't NLP" (arguable?). I also point out that Mitsuku uses AIML so we've got a long way to go to hit the limits of AIML when it comes to conversational bots. Still, we aim to progress this further this year. One native Python library that looks interesting is https://spacy.io/ ... but Watson and Luis are also in play.

KS> Some a platform that takes an english sentence, breaks it up into a series of words and then applies pattern matching is not viewed as NLP lol. Yeah get this all the time. Then you point them to the stanford parser (NLTK etc) which is great at breaking the sentence into Verb, Noun, Pronoun etc, but then you still need to apply a greedy tree based pattern matcher to the output. As for Machine Learning, unless you have a huge amount of data then its hard, and if you don't, you can tell the ML system a cat is a dog 100 times and the first answer it gives is its a dog !!!!

In any case I am thinking there will still be both AIML and NLP (rather than either/or) , along the lines of...

                 /--- NLP ---\

Client --> router --+ +---> AIML interface grammar -> sraix -> Python extensions/services --- AIML---/

What is the router? I think the easiest to start is to try AIML first, and if no matches pass through to NLP. If matched, it returns an AIML interface sentence rather than creating a second interface path to the Python extensions/libraries. This is what we are now doing with a fuzzy search FAQ bot POC using whoosh. Still early days though :-)

KS> This is aligned with what I am currently working on, integration with Rasa Core is early days. The main issues are not training twice. Writing AIML is the first form of training, then writing Rasa config is a duplicate, so I'm working on how to train Rasa from AIML files.. early days but fun to play with

KS> Also looking at integrating with Wit.ai and Watson etc, unfortunately these are remote services and most a paid services too. They work but limited use for something like Program-Y

The other area that's getting some attention is voice interaction (on premise again). Is WebRTC a potential future "client?"

KS> WebRTC is in the queue behind a Web Sockets client I am working on, its all basically the same under the hood so definately in the pipelien

keiffster commented 6 years ago

@greecehalf Which client are you using

greecehalf commented 6 years ago

Now I use the console client to debug and add new functions. Later I will use webchat client for presentation. Eventually, the chatbot will be used in Android and IOS platform.

发件人: Keith Sterling [mailto:notifications@github.com] 发送时间: 2018年5月8日 16:48 收件人: keiffster/program-y program-y@noreply.github.com 抄送: #WANG WANG1122@e.ntu.edu.sg; Mention mention@noreply.github.com 主题: Re: [keiffster/program-y] 2018 Product Backlog (#127)

@greecehalfhttps://github.com/greecehalf Which client are you using

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/keiffster/program-y/issues/127#issuecomment-387331274, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AkQ9lm1K7s7xxGhTUaJKRiHtJ6qJNmhvks5twVvMgaJpZM4RwjhN.

keiffster commented 6 years ago

The console client is single user and therefore sets the userid to 'console', Webchat however writes a unqiue cookie back to the browser and uses this as the userid. So if you return it knows who you are. The userid is also used to store distinct user settings, so it will provide everything you need

if you are using mobile, then I assume you are going to call the REST service, if so, you create a unique ID in your mobile client and then that in the REST call and you will get the same functionality as the Webchat

keiffster commented 6 years ago

@ideasean https://github.com/keiffster/program-y/wiki/Hot-Reload

seghcder commented 6 years ago

Re other comments above -

Separate test files - understood

Multiprocessing - I suspect that doing WebRTC session management, inbound voice recognition, parsing via AIML (professor) or NLP, then generating the voice response into iLBC and passing back to the client will be enough to exhaust a single thread :-)

NLP - will it be enough to dethrone Mitsuku? Good point re duplication. Re cats and dogs - maybe the cat really was a dog

Edit: Also re question order and Skype ... yes queues will help, but then we also pass async messages back to the client during known long running question processing (eg "this might take a moment"). So strict ordering has downsides too!

sidedger commented 6 years ago

Hi, I am a developer who is using your frame of program-y. It's really great. And now I'm writing to ask you whether I can start with Gunicorn. If I can, can you please tell me how can I connect it with Gunicorn? I will preciate it if you can give me the right way. Thank you!

irfanandratama commented 6 years ago

Hi, I want to ask if it is possible to use program Y like python-aiml library in here? I mean like the example they provided in that Git just import the lib and then call the object into our own code

keiffster commented 6 years ago

@sidegder take a look at https://www.digitalocean.com/community/tutorials/how-to-serve-flask-applications-with-gunicorn-and-nginx-on-ubuntu-14-04

Which has a full description of how a flask app works with gunicorn

All the programy web apps use flask so should be a simple integration

keiffster commented 6 years ago

In terms of using programy as an embedded library take a look at console app which is the simplest app available

Alternatively I am close to releasing v3 which will include his functionality

irfanandratama commented 6 years ago

Hope v3 release soon

ohoachuck commented 6 years ago

+1 O.

Sent from my iPhone

On 6 Jul 2018, at 21:10, Irfan Hanandra notifications@github.com wrote:

Hope v3 release soon

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

keiffster commented 6 years ago

It’s coming, just a couple of minor delays, a holiday and the fact that at my main job the company just got acquired !!!!

Anyway prob about 1-2 weeks away from a push to the dev branch for people to experiment with

K

keiffster commented 5 years ago

Moving all ideas and requests into 3.x backlog, you see them start to appear on the Project board shortly