pragunbhutani / dbt-llm-tools

RAG based LLM chatbot for dbt projects
Other
48 stars 12 forks source link

Great project #21

Closed quiglec closed 3 weeks ago

quiglec commented 4 weeks ago

I was looking for a solution to point an LLM at my team's dbt repo and came upon this project. Was able to get it up and running pretty quickly with your instructions. I work more as a data analyst / PM, so this has been a great accelerator for experimenting! Have actually been able to use it for query writing a few times now. A couple thoughts after using it:

  1. Our team keeps pretty diligent model documentation in the yml files. I may be reading the codebase incorrectly so let me know if I'm missing something. It looks like model documentation is inferred by the LLMs reading the .sql files rather than pointing at the yml documentation. If it's available, would it be possible for the chatbot to read from the yml definition instead? Thinking that could help the results.
  2. Our project is fairly big. The chatbot is often digging up models that aren't really best to use for the specific context. (e.g. grabbing some model from our machine learning team rather than one from our business team) I know it's possible to limit which models it looks at with the file path. It would be nice to save the different model groupings as various contexts for the model to look at. I've found myself copying-and-pasting different folder paths in depending on the context of the question I'll be asking. Perhaps this is unique to our team.

Overall, great project. Think there's really cool potential here. Happy to share any other details and feedback.

pragunbhutani commented 3 weeks ago

Thanks for the encouraging words @quiglec, they really go a long way!

  1. This tool actually uses your dbt documentation to respond to questions first. The bit which parses your model code to infer documentation is being added as a supporting function for those organisations that haven't been able to keep their documentation up to date - so yeah, don't worry the chatbot is actually reading from your YML documentation. This becomes more evident when you interact via the interface. The step where you "load models into the vector database" actually fetches your own documentation.
  2. This is really useful feedback. You're right I think there should be a tagging layer that allows you to provide this context programmatically to the chatbot. I think there might be a way to modify the search function to take custom instructions into account. We've also been talking about building a teaching module that allows you to give a chatbot feedback so the answers improve over time.

I'm curious to learn if this is actually something you would consider putting into production internally? If so, what sort of an interface would you interact with it through?