haesleinhuepf / bia-bob

BIA Bob is a Jupyter+LLM-based assistant for interacting with image data and for working on Bio-image Analysis tasks.
BSD 3-Clause "New" or "Revised" License
83 stars 6 forks source link

Combine operands #1

Closed haesleinhuepf closed 1 year ago

haesleinhuepf commented 1 year ago

It would be cools if we had multi-input tools available. This would allow us to call operations such as "Apply the seeded watershed algorithm to the membrane image and use the nuclei segmentation as seeds."

Unfortunately, the StructuredTool is not compatible with the AgentType.CHAT_CONVERSATIONAL_REACT_DESCRIPTION.

Hints:

kevinyamauchi commented 1 year ago

Hey! I think I've figured out how to chain tools together with multiple inputs. See the gist below for a demo (ignore the rate limit errors - I haven't paid for the API yet 😆 ). In this demo, I load the image, multiply it by a scalar (i.e., two inputs), and view the result.

Basically, I think we can swap the agent type to AgentType.STRUCTURED_CHAT_ZERO_SHOT_REACT_DESCRIPTION and we can use the StructuredTool class.

https://gist.github.com/kevinyamauchi/4eb286e9a1f342ad0e854b1dc9fdd445

edit: I haven't tested fully, but happy to do so and make a PR if you think this is a viable approach.

kevinyamauchi commented 1 year ago

The other piece is for chaining operations like you described above, I think we need to add more a structured data registry so that tools can easily load/save results (e.g., label images should be easily discernible from intensity images)

haesleinhuepf commented 1 year ago

Basically, I think we can swap the agent type to AgentType.STRUCTURED_CHAT_ZERO_SHOT_REACT_DESCRIPTION and we can use the StructuredTool class.

I've tried that too! This turns Bob into a bot that has no memory :-(

kevinyamauchi commented 1 year ago

Ah dang. What are the use cases for having the memory? Recalling the last result?

haesleinhuepf commented 1 year ago

Check out this notebook. It's executed after changing to AgentType.STRUCTURED_CHAT_ZERO_SHOT_REACT_DESCRIPTION It kinda still works, but some responses are suspiciously dumb:

image

And:

image

haesleinhuepf commented 1 year ago

This might be a potential solution, but it seems a deep rabbit hole: https://python.langchain.com/docs/use_cases/autonomous_agents/autogpt

haesleinhuepf commented 1 year ago

I've just tried what's described here and the AgentType.STRUCTURED_CHAT_ZERO_SHOT_REACT_DESCRIPTION then works better with memory. code

It's still not convincing. It seems to give more wrong answers and I received some timeouts when running it. Will try again later.

image

kevinyamauchi commented 1 year ago

Hey! I have another attempt, now creating an LLMchain that adds the conversation history into the prompt. I did a super basic test and it seems like it could be working (loads two images and remembers which was first). What do you think?

https://gist.github.com/kevinyamauchi/491944468412d817038c2ebbb7d9cc91

kevinyamauchi commented 1 year ago

I added back in the multi-input compatibility, so I think this shows both using conversation history and multi-input tools. This seems like it could be a good approach. We made consider adding some additional agents to the chain to manage caching/IO of intermediate results.

I will make a PR to see how converting the current tools over to this agent would look.

https://gist.github.com/kevinyamauchi/9e4049b799bb705c575c8ae3070c7dae

haesleinhuepf commented 1 year ago

Oh that notebook looks super amazing @kevinyamauchi ! I can't wait for this PR ❤️