huggingface / transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
https://huggingface.co/transformers
Apache License 2.0
131.2k stars 26.08k forks source link

add `stream` to pipeline parameters #30487

Open not-lain opened 4 months ago

not-lain commented 4 months ago

Feature request

add option to stream output from pipeline

Motivation

using tokenizer.apply_chat_template then other stuff then model.generate is pretty repetitive and I think it's time to integrate this with pipelines, also it's time to add a streaming pipeline too.

Your contribution

I can provide this resource as a reference. This is a pr I made with the requested feature https://huggingface.co/google/gemma-1.1-2b-it/discussions/14. another tip I can provide is don't use yield and return in the same function, you should separate them (it's a python problem) sadly I'm a bit busy lately to open a PR, but if I could find some time I'll try to help out.

amyeroberts commented 4 months ago

Hi @not-lain, thanks for opening a feature request!

using tokenizer.apply_chat_template then other stuff then model.generate is pretty repetitive

Could you elaborate on this a bit e.g. with a code snippet? Is is the streaming feature when generating you wish to be able to use?

not-lain commented 4 months ago

@amyeroberts normally when someone wants to stream their output (example: https://huggingface.co/spaces/ysharma/Chat_with_Meta_llama3_8b) they need to apply all that code, and this has been quite a repetitive process for AI models, and I thought we can implement this within the transformers library.

not-lain commented 4 months ago

I was thinking about integrating this with only text-generation models, but I think we can do that too with image-to-text models.

this is a good resource for that: https://huggingface.co/blog/idefics#getting-started-with-idefics

amyeroberts commented 4 months ago

Thanks for sharing an example!

I'm not sure this is really something we want to add to the pipelines. Pipelines are intended to be simple objects which enable users to get predictions in one line, they're not intended to support all transformers' functionality. In this case, I think it makes sense to leave streaming outside as it enables the user to have full control of the threads and yielding logic.

cc @Rocketknight1 @gante for your thoughts

Rocketknight1 commented 4 months ago

Yeah, I'm on @amyeroberts's side here - pipelines are (imo) a sort of high-level "on-ramp" API for transformers, which make it easy for users to quickly get outputs from common workflows. We definitely don't want to pack them full of features to handle every use-case - that's what the lower-level API is for! If we make pipelines very feature-heavy, then they become very big and confusing for new users, which defeats their purpose.

Once users are streaming output and working with threads/yielding/async/etc. they're probably advanced enough that they don't need the pipelines anyway.

fakerybakery commented 3 months ago

Personally would love to have streaming support in pipelines - it’s the one missing feature. Currently, streaming is quite difficult to use, but this would make it so much easier.

gante commented 3 months ago

FYI: we will be refactoring generate over the next weeks, including adding a better support for yield. It may work with pipelines, but it would be a side-effect: as @Rocketknight1 wrote, we don't want to pack too many features there, as it would defeat the point. The pipeline API is not designed to work with async stuff :)

not-lain commented 3 months ago

it's ok, I understand. I will also take a look at the generate issue, maybe I can help out a little

gante commented 2 months ago

generate refactor tracker: https://github.com/huggingface/transformers/issues/30810