Open haesleinhuepf opened 1 year ago
Haha, I was hoping you would go down that path... I did have a look early on at the assistant's function enumeration code...
I would propose a simple scheme where all classes that derive from NapariBaseTool are automatically discovered and added... Or something like that...
I agree with @haesleinhuepf that it would be nice to be able to register tools without making a PR to napari-chatgpt
. For me the main requirements would be:
langchain
?)langchain
. I think langchain
is super promising, but the field is moving super fast and it feels like we should be able to easily swap out the LLM component.What do you think, @haesleinhuepf and @royerloic , are there other requirements we should consider? Are anyof those too strict/onerous?
All of the discovery methods I am aware of using a base class (e.g., NapariBaseTool
) require importing the class. Do you know of a way that doesn't?
Would a .yaml file describing the available plugins be too much to ask of the developer? The implications here is that they would have to both define an entry point and make sure the yaml file is packaged.
Another option that is similar to @haesleinhuepf 's PR is that developers could register a function that just returns the names/import paths of the available tools. We would then have to trust the developers to implement this in a way that doesn't cause a slow startup.
Any other ideas?
1. the dependencies to register are minimal (ideally pure python, but maybe langchain?)
Fully agreed. Also let's avoid langchain if possible
2. the discovery process doesn't import the function.
Tricky. If possible, I would like to avoid an overengineered-solution like npe2. Maybe langchain has alternative solutions for this (docs).
3. The registration isn't overfit to
langchain
. I thinklangchain
is super promising, but the field is moving super fast and it feels like we should be able to easily swap out the LLM component.
I was also thinking of registering functions that consume str
and produce str
instead of registering langchain Tools
. But I'm not sure. I haven't found out yet how to make chat-compatible tools that can have more than one string as parameter (see related issue).
I was also thinking of registering functions that consume str and produce str instead of registering langchain Tools. But I'm not sure. I haven't found out yet how to make chat-compatible tools that can have more than one string as parameter (see https://github.com/haesleinhuepf/bia-bob/issues/1).
I agree that we should avoid registering langchain tools. This feels too tied to the current implementation. Although, I think the information given to the langchain StructuredTool
is pretty close to what we want (i.e., a text description and the input schema). I think it should then be up to the backend (e.g., bob
or napari-gpt
) to translate that into something it can use to interface with the LLM (e.g., Tool
/StructuredTool
for langchain).
In terms of a path forward, I would propose the following:
LLMTool
). LLMTool
in bia-bob
to see how it feelsLLMTool
classI think the LLMTool
baseclass can be really simple:
class LLMTool(ABC):
# we may want names that are separate from the function name
# to prevent namespace collision (e.g., "bia-bob:regionprops")
name: str
# I think it is nice to have a description field separate from
# the function docstring so that extra instructions can be given to the LLM
description: str
@abstractmethod
def function(self):
raise NotImplementedError
In the example of bia-bob, a registration function could then look like:
def register_llm_tool(tool_to_register: LLMTool):
new_tool = StructuredTool.from_function(
tool_to_register.function,
name = tool_to_register.name,
description=tool_to_register.description
)
_context.tools.append(new_tool)
# update agent if necessary
Hi @royerloic ,
thanks again for paving this path. I played a bit with your code and found this place that could potentially profit from flexible extensibility. I'm wondering how you would approach this and if you're interested in a PR in that direction.
Use case: Assume someone has the napari-assistant installed. The assistant can list functions (sorted in categories such as denoising, segmentation, etc) and you can search in their docstrings for terms such as 'nuclei'. Combining it with Omega, one could then ask Omega
Which functions do you have available for segmenting nuclei?
and Omega could list these functions. We could then afterwards ask Omega to add one specific function as widget to the viewer or to execute it directly. This could lead to better code than random assemblies of scikit-image and opencv snippets. There are some hundred of these functions spread over the napari-assistant ecosystem and thus, I could imagine that this way of interacting with it might be useful.The question is: how should one add a napari-assistant extension to Omega? A PR is one option and I'm wondering if you thought about an alternate way for making Omega openly extensible.
Looking forward to hear what you think.
Best, Robert