kedro-org / kedro

Kedro is a toolbox for production-ready data science. It uses software engineering best practices to help you create data engineering and data science pipelines that are reproducible, maintainable, and modular.
https://kedro.org
Apache License 2.0
9.91k stars 900 forks source link

Investigate the use of a chatbot to form a knowledgebase #2026

Open stichbury opened 1 year ago

stichbury commented 1 year ago

We need a good way for Kedro users to get answers to their questions. Right now, they could search the discord linen archive or Slack for previous discussions but the UX isn't great. Or they can look at our written FAQ (which they do -- it's a popular page) but it doesn't deliver what they expect (it doesn't go into specific answers to specific questions). They probably end up on Google looking for answers and maybe StackOverflow.

It would be great to run a NLP chatbot that has been trained on our archives and documentation and can have a stab at an answer or link users to the right location to start their research. This is the holy grail of all documentation though, and does rely on a decent knowledgebase to train it, which we probably don't have (at least, we have content, but it's not clear whether it is suitable).

I think we need to first investigate the state of this kind of solution and then look at whether we can apply it to a Slackbot for Kedro.

This is early days, but here's a few links:

This issue is to seed some discussion and potentially earmark some time for research at a hackathon or similar spike.

stichbury commented 1 year ago

I've been playing with ChatGPT (https://chat.openai.com/chat) a bit recently and there's definitely some scope.

Screenshot 2022-12-05 at 12 44 43
stichbury commented 1 year ago

Recently, we've seen the ChatGPT project put out a beta of what I was mumbling about above. Thanks OpenAI 👯

While it's not yet ready, there are some ways we can prepare for a future where Kedro users turn to ChatGPT to answer their queries. A rough list:

yetudada commented 1 year ago

Looks like this will be possible, Dagster built something like this: https://dagster.io/blog/chatgpt-langchain

This ticket should also probably link to #1649

stichbury commented 1 year ago

@astrojuanlu Also this one up for discussion

astrojuanlu commented 1 year ago

Somebody pointed me into this direction: https://langchain.readthedocs.io/en/latest/use_cases/question_answering.html

datajoely commented 1 year ago

Relevant tool https://www.mendable.ai/?s=03

astrojuanlu commented 1 year ago

More: https://docsbot.ai/

noklam commented 1 year ago

It's crazy how quick this space evolve. It's quite feasible to build one with langchain, you can also limited the context that it reads docs.kedro.org and generate answer with relevant link only in the docs (so it's not making random stuff up).

astrojuanlu commented 1 year ago

image

https://www.elastic.co/blog/chatgpt-elasticsearch-openai-meets-private-data

astrojuanlu commented 1 year ago

"Don't replace your user community with an LLM-based chatbot" https://thisisimportant.net/posts/user-community-llm-chatbots/

astrojuanlu commented 1 year ago

Beware of pushback from the tech community https://github.com/mdn/yari/issues/9208

datajoely commented 1 year ago

This is interesting https://docs.danswer.dev/introduction https://github.com/danswer-ai/danswer

stichbury commented 1 year ago

"Don't replace your user community with an LLM-based chatbot" https://thisisimportant.net/posts/user-community-llm-chatbots/

I don't think it was ever a binary choice of either/or, was it? If the community want a knowledgebase, let's give it to them alongside current options...we're not proposing to remove anything.

stichbury commented 1 year ago

@noklam and I worked on this as part of the Quantazio Hack. It would be good to continue the work as part of the ongoing docs effort.

astrojuanlu commented 10 months ago

If we ever get to this, probably we'd use some form of Retrieval Augmented Generation (RAG), see https://github.com/imartinez/privategpt

astrojuanlu commented 10 months ago

Or just https://slack.com/intl/en-gb/help/articles/202026038-An-introduction-to-Slackbot#add-customized-automatic-responses

astrojuanlu commented 10 months ago

also xref https://github.com/kedro-org/kedro-plugins/pull/434

astrojuanlu commented 1 month ago

Related: publishing a custom GPT on Kedro, MLOps? https://help.openai.com/en/articles/8798878-building-and-publishing-a-gpt