huggingface / optimum-neuron

Easy, fast and very cheap training and inference on AWS Trainium and Inferentia chips.

Apache License 2.0

176 stars 51 forks source link

Add InstructPix2Pix pipeline support. #625

Closed asntr closed 17 hours ago

asntr commented 3 weeks ago

What does this PR do?

Fixes: #624

Added a support for loading and compiling InstructPix2Pix pipeline using neuron

Before submitting

[ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
[x] Did you make sure to update the documentation with your changes?
[x] Did you write any new necessary tests?

asntr commented 3 weeks ago

Hi! I'd like to highlight a point with InstructPix2Pix Unet inference.

Since it uses 3 inputs to the unet with text guidance and image guidance, I can't use static batch size with data parallel on 2 devices, so I'm passing dynamic_batch_size=True (so I'm splitting it 2:1). But it is a setting for all models, so it is a suboptimal solution.

What do you think is better here? Introduce a new parameter that allows to set dynamic batching exclusively for unet? Also do we need to address it somewhere in other parts of code that tied with unet export?

HuggingFaceDocBuilderDev commented 2 weeks ago

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

JingyaHuang commented 2 days ago

Hi @asntr, what do you think of the changes on the compilation that I suggested? Let me know if you are interested in working on this! And we can get this PR merged first and improve the support with another PR as well.

JingyaHuang commented 2 days ago

If you prefer to get this PR merged first, could you rebase your branch, there was a fix on the styling tool today, with it the CI shall be good.

asntr commented 2 days ago

Hi @JingyaHuang!

Sorry for a delayed response.

I was thinking that choosing batch size depending on data_parallel_mode is a great thing and it can be useful not only in ip2p case, but in other diffusion pipelines.

For example, t2i pipeline doesn't allow to use cfg when you are not using dynamic_batch_size with data parallel modes different from "unet". Here we can apply the same logic of multiplying batch size by 2 if data_parallel_mode is not "unet".

So, maybe it is indeed a task for a next pr, and I can totally work on it!

asntr commented 2 days ago

I also have this example of inference output on INF2 for my snippet, should I update the docs with this example (and create a PR into documentation-images) ? 011-sd-ip2p

JingyaHuang commented 2 days ago

Hi @JingyaHuang!

Sorry for a delayed response.

I was thinking that choosing batch size depending on data_parallel_mode is a great thing and it can be useful not only in ip2p case, but in other diffusion pipelines.

For example, t2i pipeline doesn't allow to use cfg when you are not using dynamic_batch_size with data parallel modes different from "unet". Here we can apply the same logic of multiplying batch size by 2 if data_parallel_mode is not "unet".

So, maybe it is indeed a task for a next pr, and I can totally work on it!

Sounds great, thanks @asntr! Ping me if you need any help.

JingyaHuang commented 2 days ago

I also have this example of inference output on INF2 for my snippet, should I update the docs with this example (and create a PR into documentation-images) ?

Yeah please do! The image looks great!

We could put it under the sdxl section: https://github.com/huggingface/optimum-neuron/blob/main/docs/source/tutorials/stable_diffusion.mdx#stable-diffusion-xl-turbo

Thank you!

asntr commented 1 day ago

Hi @JingyaHuang , I placed the docs under the stable diffusion section as ip2p pipeline is inside stable_diffusion directory in diffusers: https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_instruct_pix2pix.py

I also opened this pr: https://huggingface.co/datasets/optimum/documentation-images/discussions/4

Let me know if you're happy with this.

Thanks!