royerlab / napari-chatgpt

A napari plugin to process and analyse images with chatGPT!
BSD 3-Clause "New" or "Revised" License
223 stars 24 forks source link

Extensibility #5

Open haesleinhuepf opened 1 year ago

haesleinhuepf commented 1 year ago

Hi @royerloic ,

thanks again for paving this path. I played a bit with your code and found this place that could potentially profit from flexible extensibility. I'm wondering how you would approach this and if you're interested in a PR in that direction.

Use case: Assume someone has the napari-assistant installed. The assistant can list functions (sorted in categories such as denoising, segmentation, etc) and you can search in their docstrings for terms such as 'nuclei'. Combining it with Omega, one could then ask Omega Which functions do you have available for segmenting nuclei? and Omega could list these functions. We could then afterwards ask Omega to add one specific function as widget to the viewer or to execute it directly. This could lead to better code than random assemblies of scikit-image and opencv snippets. There are some hundred of these functions spread over the napari-assistant ecosystem and thus, I could imagine that this way of interacting with it might be useful.

image

The question is: how should one add a napari-assistant extension to Omega? A PR is one option and I'm wondering if you thought about an alternate way for making Omega openly extensible.

Looking forward to hear what you think.

Best, Robert

royerloic commented 1 year ago

Haha, I was hoping you would go down that path... I did have a look early on at the assistant's function enumeration code...

I would propose a simple scheme where all classes that derive from NapariBaseTool are automatically discovered and added... Or something like that...

kevinyamauchi commented 11 months ago

I agree with @haesleinhuepf that it would be nice to be able to register tools without making a PR to napari-chatgpt. For me the main requirements would be:

  1. the dependencies to register are minimal (ideally pure python, but maybe langchain?)
  2. the discovery process doesn't import the function. If we import the functions during discovery, as more tools are registered, the startup time also increases (especially if a developer has a heavy import).
  3. The registration isn't overfit to langchain. I think langchain is super promising, but the field is moving super fast and it feels like we should be able to easily swap out the LLM component.

What do you think, @haesleinhuepf and @royerloic , are there other requirements we should consider? Are anyof those too strict/onerous?

All of the discovery methods I am aware of using a base class (e.g., NapariBaseTool) require importing the class. Do you know of a way that doesn't?

Would a .yaml file describing the available plugins be too much to ask of the developer? The implications here is that they would have to both define an entry point and make sure the yaml file is packaged.

Another option that is similar to @haesleinhuepf 's PR is that developers could register a function that just returns the names/import paths of the available tools. We would then have to trust the developers to implement this in a way that doesn't cause a slow startup.

Any other ideas?

haesleinhuepf commented 11 months ago

1. the dependencies to register are minimal (ideally pure python, but maybe langchain?)

Fully agreed. Also let's avoid langchain if possible

2. the discovery process doesn't import the function.

Tricky. If possible, I would like to avoid an overengineered-solution like npe2. Maybe langchain has alternative solutions for this (docs).

3. The registration isn't overfit to langchain. I think langchain is super promising, but the field is moving super fast and it feels like we should be able to easily swap out the LLM component.

I was also thinking of registering functions that consume str and produce str instead of registering langchain Tools. But I'm not sure. I haven't found out yet how to make chat-compatible tools that can have more than one string as parameter (see related issue).

kevinyamauchi commented 11 months ago

I was also thinking of registering functions that consume str and produce str instead of registering langchain Tools. But I'm not sure. I haven't found out yet how to make chat-compatible tools that can have more than one string as parameter (see https://github.com/haesleinhuepf/bia-bob/issues/1).

I agree that we should avoid registering langchain tools. This feels too tied to the current implementation. Although, I think the information given to the langchain StructuredTool is pretty close to what we want (i.e., a text description and the input schema). I think it should then be up to the backend (e.g., bob or napari-gpt) to translate that into something it can use to interface with the LLM (e.g., Tool/StructuredTool for langchain).

kevinyamauchi commented 11 months ago

In terms of a path forward, I would propose the following:

  1. we make a small library that defines a Protocol or base class that holds the required metadata for a tool (e.g., LLMTool).
  2. we punt on the discovery process. defining the class and metadata seems like the important thing for now. We can always add discovery on top of it.
  3. we try using the LLMTool in bia-bob to see how it feels
  4. we can eventually add registration on top of the LLMTool class

I think the LLMTool baseclass can be really simple:


class LLMTool(ABC):
    # we may want names that are separate from the function name
    # to prevent namespace collision (e.g., "bia-bob:regionprops")
    name: str

    # I think it is nice to have a description field separate from
    # the function docstring so that extra instructions can be given to the LLM
    description: str

    @abstractmethod
    def function(self):
        raise NotImplementedError

In the example of bia-bob, a registration function could then look like:


def register_llm_tool(tool_to_register: LLMTool):
    new_tool = StructuredTool.from_function(
        tool_to_register.function,
        name = tool_to_register.name,
        description=tool_to_register.description
    )

    _context.tools.append(new_tool)
    # update agent if necessary