huggingface / transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
https://huggingface.co/transformers
Apache License 2.0
134.08k stars 26.81k forks source link

Agent LLM Engine Support Local Inference #32245

Closed swtb3 closed 1 month ago

swtb3 commented 3 months ago

Feature request

The provided HfEngine class uses the inference API under the hood, this makes using agents simple. However, it would be good to support local inference in a similarly simple way.

If this is already supported through existing local inference pipelines (text-generation pipeline) then documentation should disambiguate the standard approach to local inference with agents.

Motivation

It was unclear when following the tutorial how to approach local inference with agents as it uses inference API by default.

Your contribution

Happy to make these changes, would need some support on the approach.

Update

I have had some success replacing the client with a pipeline, though this is a bit messy and took a fair bit of trial and error.

I'm so far unable to get the working local agent to work with the gradio chat interface.

LysandreJik commented 2 months ago

cc @aymeric-roucher

gxcuit commented 2 months ago

I also have this request

swtb3 commented 2 months ago

@gxcuit you can view my implementation here:

https://github.com/swtb3/math_agent_demo

It's not the cleanest, but it does work for local inference with a hf pipeline. I also tried to add support for ollama. However I found that these ollama models were unable to properly leverage the agent react framework. And often they failed to perform their role.

The HF pipeline worked really well though.

aymeric-roucher commented 1 month ago

Thank you for this feature request @swtb3 ! In the PR #33218 that I've merged above, your point should be addressed: now you can just initialize a TransformersEngine with your custom transformers pipeline! cc @gxcuit