huggingface / transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
https://huggingface.co/transformers
Apache License 2.0
134.99k stars 27k forks source link

feat: `agent.run(return_agent_types=True)` #24339

Closed aarnphm closed 1 year ago

aarnphm commented 1 year ago

Feature request

Currently, agent.run on main will run materializer from AgentType to return its corresponding type.

I think it would be a great addition to just return this AgentType directly for external libraries to build on top of!


agent = transformers.HfAgent("inference-api-endpoint")

res: AgentType = agent.run(..., return_agent_types=True)

Motivation

I'm currently playing around with the new agent API, and found that in cases where I don't want to return the decoded outputs immediately, it would be nice to get AgentType and manage the materialize myself.

Your contribution

I can help to create PR, but I know that the Agent API are still very experimental and unstable

cc @LysandreJik on this

LysandreJik commented 1 year ago

Hey @aarnphm, could you provide a code sample with the return you'd like to receive so that I can play with it and see if it makes sense to implement it? Thanks!

aarnphm commented 1 year ago

For example, I'm currently building OpenLLM and came across a use case where one can define an agent to generate an image and then caption it via a pipeline using BentoML Runner

OpenLLM also provides support for HuggingFace Agent where users will can switch between The inference endpoint or hosting their own starcoder.

Given the following segment to save a captioning pipeline

import bentoml, transformers

bentoml.transformers.save_model("captioning", pipeline('image-captioning'))

Runner is distributed by nature, and it can be defined in a service.py like so:

import bentoml
import transformers

captioning_runner = bentoml.transformers.get("captioning").to_runner()

agent = transformers.HfAgent("http://283.23.22.1:3000/hf/agent")  # `openllm start starcoder`

service = bentoml.Service("agent-with-runners", runners=[captioning_runner])

def preprocess(input_tensor: torch.Tensor) -> torch.Tensor:
    ... 

@svc.api(input=bentoml.io.Text(), output=bentoml.io.Text())
async def transcribe_audio_to_french(prompt: str):
    image_output: ImageAgentType = agent.run(prompt, ..., return_agent_types=True)
    # then I do some preprocess with this tensor
    input_for_pipeline = preprocess(image_output.to_raw())
    return await async captioning_runner.async_run(input_for_pipeline)

You can run this with bentoml serve service.py:svc

This can be one use case of the AgentType that can be useful here, where one can access the tensor directly without having to convert from PIL.Image output (which is currently what the agent.run returns if it returns an image if I understand it correctly)

github-actions[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.