I'd love to have a tool that can harvest examples from github repos, stack exchange, etc., which we can use in a RAG approach to example creation.
Based of the discussion at this morning's meeting, having examples from places where the code is already in use would be better than just creating a basic example.
I can see such a tool being useful for mature open source projects like NumPy.
I'm not sure what the best route for this would be, and would love a discussion about it. Here's my initial thoughts.
The user provides the name of a function (such as svdvals) and possibly the longer name (np.linalg.svdvals and/or numpy.linalg.svdvals) or just the base (numpy.linalg`), or whatever works bests.
The example harvester searches for pages containing these words and harvests a collection of them (how many? 10,20, 100? not sure).
The harvested pages are then formatted appropriately for use in Ragna. What exactly that looks like, I'm not sure.
The harvester needs to keep track of and report (in some log?) which pages got included in the Ragna formatted work (cite sources, intellectual property right issues, etc.)
The hope would be something that can perform this quickly for a given function (10-20 seconds tops or much less).
Concerns:
Copyright. I'm not sure what the legal landscape will look like here, and we don't have immediate access to a lawyer for this project. Right now I say let's give it a try, see what happens, and then we'll manually check intellectually property rights related to the examples we harvested.
I'd love to have a tool that can harvest examples from github repos, stack exchange, etc., which we can use in a RAG approach to example creation.
I'm not sure what the best route for this would be, and would love a discussion about it. Here's my initial thoughts.
svdvals
) and possibly the longer name (np.linalg.svdvals
and/ornumpy.linalg.svdvals) or just the base (
numpy.linalg`), or whatever works bests.The hope would be something that can perform this quickly for a given function (10-20 seconds tops or much less).
Concerns:
Thoughts?