pydata / xarray

N-D labeled arrays and datasets in Python
https://xarray.dev
Apache License 2.0
3.63k stars 1.09k forks source link

Introducing xarray Guru on Gurubase.io #9746

Closed kursataktas closed 1 day ago

kursataktas commented 2 weeks ago

Hello team,

I'm the maintainer of Anteon. We have created Gurubase.io with the mission of building a centralized, open-source tool-focused knowledge base. Essentially, each "guru" is equipped with custom knowledge to answer user questions based on collected data related to that tool.

I wanted to update you that I've manually added the xarray Guru to Gurubase. xarray Guru uses the data from this repo and data from the docs to answer questions by leveraging the LLM.

In this PR, I showcased the "xarray Guru", which highlights that xarray now has an AI assistant available to help users with their questions. Please let me know your thoughts on this contribution.

Additionally, if you want me to disable xarray Guru in Gurubase, just let me know that's totally fine.

welcome[bot] commented 2 weeks ago

Thank you for opening this pull request! It may take us a few days to respond here, so thank you for being patient. If you have questions, some answers may be found in our contributing guidelines.

TomNicholas commented 2 weeks ago

Hi, thanks for considering xarray here.

So I just tried to ask Gurubase about using the xarray skew method, and got this long response about understanding skewness and how to use xarray's da.skew() method.

But da.skew() does not exist!!! Whilst xarray does have methods .mean() and .std(), it doesn't currently have any methods for higher-order statistics such as .skew() or .kurtosis() - the LLM has completely hallucinated the existence of this method.

I noticed this problem with chatGPT a while ago, but I'm kind of wondering what the point of this service is if it's still incorrect even after supposedly being trained on our actual codebase and docs...

kursataktas commented 2 weeks ago

Thanks for taking the time to review this. I looked into it, and you’re right, it was a hallucination. I'm always working to improve on this front, so I’ll dig into it further.

The tool generally does a good job answering typical questions for beginners, often even better than ChatGPT in that regard. But I know that trickier questions can sometimes trip it up.

Also, just to clarify, it pulls from README files and documentation pages, not from the actual codebase.

shoyer commented 1 day ago

I'm going to close this. There are lots of AI tools that understand Xarray and I don't think it makes to advertise any of them at this time.