replicate / cog-flux

Cog inference for flux models
https://replicate.com/black-forest-labs/flux-dev
Apache License 2.0
272 stars 28 forks source link

Lora loading for bf16 and fp8, as separate models #24

Open andreasjansson opened 3 weeks ago

andreasjansson commented 3 weeks ago

Had to fix some bugs in the original lora loading code.

Outputs are here: https://replicate.com/replicate-internal/test-flux-dev-lora

Fusing and unfusing is also slightly lossy so the model slowly degrades over time. We could do something like what peft and add a new node that does the matmul on the fly, instead of fusing. But that would slow down inference. Curious if you have ideas @daanelson

Averylamp commented 3 weeks ago

Very excited for this PR. Thanks for doing this! I was looking for an H100 lora inference provider and this seems like it would do the trick. I was curious as well if pricing for fast generations would be any different than per image because the GPU usage time is much less?

Averylamp commented 1 week ago

Hi, I was curious if this work is continuing on this PR? I believe this should make flux dev lora inference fast enough that you'd gain a customer versus using Fal as they currently are a few seconds faster (but from my benchmarking, this should be faster in the end). Happy to take on any tasks if you would like as well.