Investigate the use of a chatbot to form a knowledgebase

stichbury commented 1 year ago

We need a good way for Kedro users to get answers to their questions. Right now, they could search the discord linen archive or Slack for previous discussions but the UX isn't great. Or they can look at our written FAQ (which they do -- it's a popular page) but it doesn't deliver what they expect (it doesn't go into specific answers to specific questions). They probably end up on Google looking for answers and maybe StackOverflow.

It would be great to run a NLP chatbot that has been trained on our archives and documentation and can have a stab at an answer or link users to the right location to start their research. This is the holy grail of all documentation though, and does rely on a decent knowledgebase to train it, which we probably don't have (at least, we have content, but it's not clear whether it is suitable).

I think we need to first investigate the state of this kind of solution and then look at whether we can apply it to a Slackbot for Kedro.

This is early days, but here's a few links:

This issue is to seed some discussion and potentially earmark some time for research at a hackathon or similar spike.

stichbury commented 1 year ago

I've been playing with ChatGPT (https://chat.openai.com/chat) a bit recently and there's definitely some scope.

stichbury commented 1 year ago

Recently, we've seen the ChatGPT project put out a beta of what I was mumbling about above. Thanks OpenAI 👯

While it's not yet ready, there are some ways we can prepare for a future where Kedro users turn to ChatGPT to answer their queries. A rough list:

[ ] We need to ensure that our corpus of Q&A is indexed. This includes linen archives and any other content existing on Slack, Discourse etc.
[ ] We create some complete and useful answers that rank well on queries such as "What is Kedro?" (and, more generally, "How do I ensure my data science code is reusable" etc. This ensures Kedro gets into the text returned)
[ ] It also makes sense to build a set of common Q&A to be indexed and provide the basis of answers to typical questions. I had previously been reluctant to write FAQ documentation because it dates and is hard to keep in-sync. I have a new insight into this based on the potential of Chat systems such as ChatGPT. I think we should write some FAQ docs and keep them on the website so they are indexed, but not linked from site navigation. The information is available for search without a signficant overhead of building a specific information architecture or even elaborate site design.

yetudada commented 1 year ago

Looks like this will be possible, Dagster built something like this: https://dagster.io/blog/chatgpt-langchain

This ticket should also probably link to #1649

stichbury commented 1 year ago

@astrojuanlu Also this one up for discussion

astrojuanlu commented 1 year ago

Somebody pointed me into this direction: https://langchain.readthedocs.io/en/latest/use_cases/question_answering.html

datajoely commented 1 year ago

Relevant tool https://www.mendable.ai/?s=03

astrojuanlu commented 1 year ago

More: https://docsbot.ai/

noklam commented 1 year ago

It's crazy how quick this space evolve. It's quite feasible to build one with langchain, you can also limited the context that it reads docs.kedro.org and generate answer with relevant link only in the docs (so it's not making random stuff up).

astrojuanlu commented 1 year ago

https://www.elastic.co/blog/chatgpt-elasticsearch-openai-meets-private-data

astrojuanlu commented 1 year ago

"Don't replace your user community with an LLM-based chatbot" https://thisisimportant.net/posts/user-community-llm-chatbots/

astrojuanlu commented 1 year ago

Beware of pushback from the tech community https://github.com/mdn/yari/issues/9208

datajoely commented 1 year ago

This is interesting https://docs.danswer.dev/introduction https://github.com/danswer-ai/danswer

stichbury commented 1 year ago

"Don't replace your user community with an LLM-based chatbot" https://thisisimportant.net/posts/user-community-llm-chatbots/

I don't think it was ever a binary choice of either/or, was it? If the community want a knowledgebase, let's give it to them alongside current options...we're not proposing to remove anything.

stichbury commented 1 year ago

@noklam and I worked on this as part of the Quantazio Hack. It would be good to continue the work as part of the ongoing docs effort.

astrojuanlu commented 10 months ago

If we ever get to this, probably we'd use some form of Retrieval Augmented Generation (RAG), see https://github.com/imartinez/privategpt

astrojuanlu commented 10 months ago

Or just https://slack.com/intl/en-gb/help/articles/202026038-An-introduction-to-Slackbot#add-customized-automatic-responses

astrojuanlu commented 10 months ago

also xref https://github.com/kedro-org/kedro-plugins/pull/434

astrojuanlu commented 1 month ago

Related: publishing a custom GPT on Kedro, MLOps? https://help.openai.com/en/articles/8798878-building-and-publishing-a-gpt

kedro-org / kedro

Investigate the use of a chatbot to form a knowledgebase #2026