Closed keiffster closed 5 years ago
Online learning web client for end user to train the bot, and generate corresponding AIML file
+1 for online learning web for end users. O.
Envoyé de mon iPhone
Le 30 janv. 2018 à 09:29, BennyShang notifications@github.com a écrit :
Online learning web client for end user to train the bot, and generate corresponding AIML file
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.
Was thinking of splitting the project structure. Keep the core bot and a cut down version of Y-Bot, all other bots split into their own Github repos, ( stop you getting 30MB of professor files !!! )
I was also thinking of splitting out extensions into their own repos to, so that energy, teleco, banking etc can grow ( or wilt ) on their own
Have been thinking about this for a while since making it available on PyPi
This bot program is ok for free talk and Q&A service. But for business flow scenario, I think we should work more on botflow part.
We need an intuitive tool to design a business service flow. For example:
Set Bot ID->Welcome Text->Service category hint->Service entrance discovery->Enter specific scenario service->Ask question for collecting input parameters(support all kind of TYPE and using sets and maps)->Decision making confirm->call oob system to get results->feedback to user->Exit specific scenario service->promotion text->bye text
This kind of chatbot service has certain commercial value.
Hope to improve botflow part in 2018.
Second part is online learning web for end user and conversation analyse tool. In the botflow part. How the end user discovery the service entrance.? How end user to response robot question for collection input parameters? We need analyze end user conversation to figure out and optimize botflow AIML.
Have been busy with new update. Some thoughts :-)
Maybe outside Program-Y somewhat but maybe as extensions or ?
Thanks Sean, some really great ideas. I am going to move all the ideas into individual cards on the Kanban board and set up some sort of voting/priority selection
Watch this space for more details on commercial support !!!
I did work on mattermost integration and (for that purpose) markdown formatted replies (pictures, tables). Not nice, just as a quick and dirty PoC — I can definitely share my experience with that and some ideas how to bolt it on.
Klaus
Project splitting is now complete for now, all bots have been moved into their own projects and primary delivery mechanism for Programy moved to PyPi. In terms of some of the requests, comments inline below.
Reporting error counts during startup to stdout, so we can see if any AIML or services etc failed to load. Option to "not start" if failed.
Extending error checking - including error checks within AIML files alongside statements
Logging reports which AIML file a match was originally located in
Hot-restart - reload without the client going offline. Messages are queued until restarted
Native AD support via additional auth module
Bot commands for superusers / admins - @bot exit, @bot reload, @bot userlist, @bot usermsg, @bot broadcastmsg, @bot MOTD, @bot userstats, @bot resetuser, @bot echouser (see a user session), @bot joinuser (three way conversation)
Marketing / boosting userbase
Continued performance improvements and issue detection (recursion etc)
Pycharm (or ?) AIML file validator... eg lightweight version of the AIML loader that can continuously eval an AIML and highlight errors, like pylint?
Stack trace for a question, and live debugging for statement evaluation...
Testing - being able to include a comment in the test statement that gets output with the test failure. * Perhaps also a severity. We run tests during deployment process.
Testing - Progress updates during testing.
Testing - does errorlevel get set if test fails? Good for building into devops toolchains / CI/CD setups. Maybe outside Program-Y somewhat but maybe as extensions or ?
NLP as discussed - either within first evaluation or like "spellcheck" format in case no AIML matches
NLP reads AIML to load initial guesses
AIML extensions to support NLP (eg named entities)
AIML - naming * with labels not just numbers (ref NLP named entites, but also useful for more robust code)
AIML - multiple patterns per template (if no wildcards, or wildcards must be in same order)
AIML - being able to use condition with star without having to think/set a var
AIML - json support (haven't played much yet, but would be good to pass and eval json natively to extensions perhaps). Perhaps could be "library" AIML file
Commercial support options :-)
SW> Extending error checking - including error checks within AIML files alongside statements SW2> Should say tests and not errors. Rather than including unit tests in separate files, I was thinking whether they could be included in AIML file below the AIML they are testing. We could do it now via XML comment eg , and having a test preprocessor extract those into a .tests file before we run the test runner. Might be better though if there were AIML tags to support this.
SW> Logging reports which AIML file a match was originally located in SW2> Understood re memory. Might be better as a "debug" option? It's sometimes challenging to find which statement was hit in what file when several public AIML sources are including (Rosie etc). This might also be required though to hot-reload a single changed AIML file? (then we get into potential issues with determinism based on file loading order).
SW> Native AD support SW2> Fair enough. pyad requires pywin32 and its had some issues with installation in the past too.
SW> Continued performance improvement... SW2> Hmm, not sure now re specifics. I think at some point there'll be a need for multiprocessing (since at present Program-Y is limited to a single processor), and threading may need some more attention re conversation ordering/queuing. Multi-protocol bot issues come in here too :-)
SW2> Eg, we have a case now where a user has to "set focus" on a given ticket before doing commands on that server. If the set command takes too long, its possible the second command gets executed before the first (since Skype is asynchronous and just launches a new ask_question in a new thread on each request received). Anyway, I think for most cases for now its fine. If a bot gets popular in an enterprise, it may start hitting these limitations (there are also workarounds too).
SW>Pycharm (or ?) AIML file validator SW2> We are also on pyCharm, and perhaps it is spoiling us :-). Already validating XML format. An XSD (XMI?) might be one option to validate AIML validity. Not sure how extensible pyCharm is to allow one to continuously load the AIML into a lightweight version of your aiml loader and highlight errors. Anyway this is probably lower priority (might be a good separate tool if AIML gets more standardised too :-) ).
Nothing that urgent above.
We seem to get more questions from potential internal customers around NLP and comments like "but AIML isn't NLP" (arguable?). I also point out that Mitsuku uses AIML so we've got a long way to go to hit the limits of AIML when it comes to conversational bots. Still, we aim to progress this further this year. One native Python library that looks interesting is https://spacy.io/ ... but Watson and Luis are also in play.
In any case I am thinking there will still be both AIML and NLP (rather than either/or) , along the lines of...
/--- NLP ---\
Client --> router --+ +---> AIML interface grammar -> sraix -> Python extensions/services
\--- AIML---/
What is the router? I think the easiest to start is to try AIML first, and if no matches pass through to NLP. If matched, it returns an AIML interface sentence rather than creating a second interface path to the Python extensions/libraries. This is what we are now doing with a fuzzy search FAQ bot POC using whoosh. Still early days though :-)
The other area that's getting some attention is voice interaction (on premise again). Is WebRTC a potential future "client?"
Session management, which assigns a unique id for every single user when different users chatting with chatbot.
@greecehalf Session management is already baked into the platform however it is different depending upon the client
Console - Single user only client - The client id is always 'console' REST - You specify the userid as a parameter in the REST call Webchat - A cookie is written back to the browser and becomes the user id Social Clients - Twitter, Facebook, Kik, Line, Viber, Slack etc - userid is already present as part of the platform and therefore the client uses this one
I am currently working on shared identity which would allow you link your userids from different platforms so that conversational state is maintained across all clients... work in progress
Thank you for your information Keith. Actually I mean, say there are two persons are talking to the same chatbot, surely that the contents will be different. How could I assign an id number to each person so that the chatbot can remember different information when talking to different people?
Regards, Tianren Wang 发件人: Keith Sterling [mailto:notifications@github.com] 发送时间: 2018年5月8日 16:19 收件人: keiffster/program-y program-y@noreply.github.com 抄送: #WANG WANG1122@e.ntu.edu.sg; Mention mention@noreply.github.com 主题: Re: [keiffster/program-y] 2018 Product Backlog (#127)
@greecehalfhttps://github.com/greecehalf Session management is already baked into the platform however it is different depending upon the client
Console - Single user only client - The client id is always 'console' REST - You specify the userid as a parameter in the REST call Webchat - A cookie is written back to the browser and becomes the user id Social Clients - Twitter, Facebook, Kik, Line, Viber, Slack etc - userid is already present as part of the platform and therefore the client uses this one
I am currently working on shared identity which would allow you link your userids from different platforms so that conversational state is maintained across all clients... work in progress
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/keiffster/program-y/issues/127#issuecomment-387323386, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AkQ9lqBU5OIVlSiCv_vfwj-vyyHdQbsxks5twVTbgaJpZM4RwjhN.
@ideasean - Comments in line
SW> Extending error checking - including error checks within AIML files alongside statements SW2> Should say tests and not errors. Rather than including unit tests in separate files, I was thinking whether they could be included in AIML file below the AIML they are testing. We could do it now via XML comment eg , and having a test preprocessor extract those into a .tests file before we run the test runner. Might be better though if there were AIML tags to support this.
KS> Preferences is to keep the tests seperate files. This follows the pattern of unit testing of most other languages, whereby the tests are kept seperate. It keeps the size of the aiml files down as a single grammar could have a huge number of variants when you include patterns matching, sets, bots, properties etc
KS> I find using the same directory structure for AIML and AIML tests means that the tests reflect the aiml files but then don't end up getting shipped into prod
SW> Logging reports which AIML file a match was originally located in SW2> Understood re memory. Might be better as a "debug" option? It's sometimes challenging to find which statement was hit in what file when several public AIML sources are including (Rosie etc). This might also be required though to hot-reload a single changed AIML file? (then we get into potential issues with determinism based on file loading order).
KS> I'll take a look at storing the file, just had a quick look and storing a has of the file, and keep the file in a single list keeps memory limit down. Only storing the file once, and then a single integer as the file reference might work. This also has added benefits of being able to quickly list all files loaded.
KS> Hot reloading is unlikley to support single AIML file, due to the significant change in the parse graph this could cause. I am close to finishing hot loading for all AIML files, single RDF files, single sets, all sets, single maps, all maps, all properties, all defaults, pattern nodes, template nodes and secuirty files
SW> Native AD support SW2> Fair enough. pyad requires pywin32 and its had some issues with installation in the past too.
KS> Going to need an Active Directory installation and I have a single Windows laptop in my current pile of machines. Might have to farm this out to some one who lives and breathes the microsoft ecosystem
SW> Continued performance improvement... SW2> Hmm, not sure now re specifics. I think at some point there'll be a need for multiprocessing (since at present Program-Y is limited to a single processor), and threading may need some more attention re conversation ordering/queuing. Multi-protocol bot issues come in here too :-)
KS> You can get some very nice performance increases by a combination of splitting your bot into multi bots, and then having them all point to a REST version. Unfortunately the really big performance increases come by replacing Flask with Sanic which uses asyncio and means multiple processors and threads come for free.
KS> Version 2 is inherently thread safe with all of the logic moved into client_context for handling state so its easier to port to other ( more windows friendly ) implementations of asyncio libraries
SW2> Eg, we have a case now where a user has to "set focus" on a given ticket before doing commands on that server. If the set command takes too long, its possible the second command gets executed before the first (since Skype is asynchronous and just launches a new ask_question in a new thread on each request received). Anyway, I think for most cases for now its fine. If a bot gets popular in an enterprise, it may start hitting these limitations (there are also workarounds too).
KS> Would message queue work here. I'm not familair with Skype for Biz, I've been working with the Microsoft Chatbot framework which integrates with Skype natively but not come across the async nature
SW>Pycharm (or ?) AIML file validator SW2> We are also on pyCharm, and perhaps it is spoiling us :-). Already validating XML format. An XSD (XMI?) might be one option to validate AIML validity. Not sure how extensible pyCharm is to allow one to continuously load the AIML into a lightweight version of your aiml loader and highlight errors. Anyway this is probably lower priority (might be a good separate tool if AIML gets more standardised too :-) ).
Nothing that urgent above.
We seem to get more questions from potential internal customers around NLP and comments like "but AIML isn't NLP" (arguable?). I also point out that Mitsuku uses AIML so we've got a long way to go to hit the limits of AIML when it comes to conversational bots. Still, we aim to progress this further this year. One native Python library that looks interesting is https://spacy.io/ ... but Watson and Luis are also in play.
KS> Some a platform that takes an english sentence, breaks it up into a series of words and then applies pattern matching is not viewed as NLP lol. Yeah get this all the time. Then you point them to the stanford parser (NLTK etc) which is great at breaking the sentence into Verb, Noun, Pronoun etc, but then you still need to apply a greedy tree based pattern matcher to the output. As for Machine Learning, unless you have a huge amount of data then its hard, and if you don't, you can tell the ML system a cat is a dog 100 times and the first answer it gives is its a dog !!!!
In any case I am thinking there will still be both AIML and NLP (rather than either/or) , along the lines of...
/--- NLP ---\
Client --> router --+ +---> AIML interface grammar -> sraix -> Python extensions/services --- AIML---/
What is the router? I think the easiest to start is to try AIML first, and if no matches pass through to NLP. If matched, it returns an AIML interface sentence rather than creating a second interface path to the Python extensions/libraries. This is what we are now doing with a fuzzy search FAQ bot POC using whoosh. Still early days though :-)
KS> This is aligned with what I am currently working on, integration with Rasa Core is early days. The main issues are not training twice. Writing AIML is the first form of training, then writing Rasa config is a duplicate, so I'm working on how to train Rasa from AIML files.. early days but fun to play with
KS> Also looking at integrating with Wit.ai and Watson etc, unfortunately these are remote services and most a paid services too. They work but limited use for something like Program-Y
The other area that's getting some attention is voice interaction (on premise again). Is WebRTC a potential future "client?"
KS> WebRTC is in the queue behind a Web Sockets client I am working on, its all basically the same under the hood so definately in the pipelien
@greecehalf Which client are you using
Now I use the console client to debug and add new functions. Later I will use webchat client for presentation. Eventually, the chatbot will be used in Android and IOS platform.
发件人: Keith Sterling [mailto:notifications@github.com] 发送时间: 2018年5月8日 16:48 收件人: keiffster/program-y program-y@noreply.github.com 抄送: #WANG WANG1122@e.ntu.edu.sg; Mention mention@noreply.github.com 主题: Re: [keiffster/program-y] 2018 Product Backlog (#127)
@greecehalfhttps://github.com/greecehalf Which client are you using
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/keiffster/program-y/issues/127#issuecomment-387331274, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AkQ9lm1K7s7xxGhTUaJKRiHtJ6qJNmhvks5twVvMgaJpZM4RwjhN.
The console client is single user and therefore sets the userid to 'console', Webchat however writes a unqiue cookie back to the browser and uses this as the userid. So if you return it knows who you are. The userid is also used to store distinct user settings, so it will provide everything you need
if you are using mobile, then I assume you are going to call the REST service, if so, you create a unique ID in your mobile client and then that in the REST call and you will get the same functionality as the Webchat
Re other comments above -
Separate test files - understood
Multiprocessing - I suspect that doing WebRTC session management, inbound voice recognition, parsing via AIML (professor) or NLP, then generating the voice response into iLBC and passing back to the client will be enough to exhaust a single thread :-)
NLP - will it be enough to dethrone Mitsuku? Good point re duplication. Re cats and dogs - maybe the cat really was a dog
Edit: Also re question order and Skype ... yes queues will help, but then we also pass async messages back to the client during known long running question processing (eg "this might take a moment"). So strict ordering has downsides too!
Hi, I am a developer who is using your frame of program-y. It's really great. And now I'm writing to ask you whether I can start with Gunicorn. If I can, can you please tell me how can I connect it with Gunicorn? I will preciate it if you can give me the right way. Thank you!
Hi, I want to ask if it is possible to use program Y like python-aiml library in here? I mean like the example they provided in that Git just import the lib and then call the object into our own code
@sidegder take a look at https://www.digitalocean.com/community/tutorials/how-to-serve-flask-applications-with-gunicorn-and-nginx-on-ubuntu-14-04
Which has a full description of how a flask app works with gunicorn
All the programy web apps use flask so should be a simple integration
In terms of using programy as an embedded library take a look at console app which is the simplest app available
Alternatively I am close to releasing v3 which will include his functionality
Hope v3 release soon
+1 O.
Sent from my iPhone
On 6 Jul 2018, at 21:10, Irfan Hanandra notifications@github.com wrote:
Hope v3 release soon
— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.
It’s coming, just a couple of minor delays, a holiday and the fact that at my main job the company just got acquired !!!!
Anyway prob about 1-2 weeks away from a push to the dev branch for people to experiment with
K
Moving all ideas and requests into 3.x backlog, you see them start to appear on the Project board shortly
I've started pulling together the product backlog for 2018, some are things I want to add myself, other things are requests from users and other things are just things which a decent virtual agent needs. Please let me know if there is anything on the list that's important to you, or if anything is missing. So, in no particular order:
I'm going to keep this thread open to allow people to post their comments and suggestions. I'll then migrate everything that makes it into the backlog onto the project Kanban board