siliconflow / onediff

OneDiff: An out-of-the-box acceleration library for diffusion models.
https://github.com/siliconflow/onediff/wiki
Apache License 2.0
1.72k stars 107 forks source link

Fallback to torch eager execution if the shapes don't match with the compiled graph #456

Open isidentical opened 11 months ago

isidentical commented 11 months ago

Is there an easy way to fallback to torch eager execution if the shapes don't match the oneflow compiled one? Torch provides torch._dynamo.run which prevents re-compilations so only compiled inputs get through the fast path and everything else falls back to eager/slow torch execution https://pytorch.org/docs/stable/torch.compiler_faq.html#why-are-you-recompiling-in-production

strint commented 11 months ago

This seems a very attractive feature. There is a way to determine whether the shape has been compiled. and we also can make the module called torch module. So this is possible to do this.

Is there a scenario which you want to fallback to torch eager( because it will be slow).

isidentical commented 11 months ago

Is there a scenario which you want to fallback to torch eager( because it will be slow).

Torch's dynamo.run actually explains it pretty well

In some cases, you may not want unexpected compiles after a program has warmed up. For example, if you are serving production traffic in a latency critical application. For this, TorchDynamo provides an alternate mode where prior compiled graphs are used, but no new ones are generated:

strint commented 11 months ago

Is there a scenario which you want to fallback to torch eager( because it will be slow).

Torch's dynamo.run actually explains it pretty well

In some cases, you may not want unexpected compiles after a program has warmed up. For example, if you are serving production traffic in a latency critical application. For this, TorchDynamo provides an alternate mode where prior compiled graphs are used, but no new ones are generated:

This is reasonable, we will try to add this feature.

strint commented 10 months ago

Since we start to disable cache and us VM to support dynamic shape, so there is no need to fall back to torch now.

https://github.com/siliconflow/onediff/releases/tag/0.12.0

@isidentical

isidentical commented 9 months ago

@strint this still makes sense for us where we are compiling pipelines that might not be officially supported by onediff yet, e.g. for AnimateDiffVideo2Video pipeline, I was able to get it to work but it had an issue where the dynamical number of frames caused it to re-compile every time (and the compiled artifacts weren't cached, e.g. it'd first compile for 12 frames then for 16 and then for 12 again).

I'd love to avoid compilation when possible. The same is true for guidance scale <= 1 (when not doing classifier free guidance).

strint commented 7 months ago

After we support dynamic shape runs, there is no good way to determine whether a shape is supported or not.

But we have a way to determine whether the inputs number has changed. If the inputs number changes, then the graph architecture should change, so we must do re-compile for a new graph.

But we can make this new compilation fall back to torch execution. Is this what you want? @isidentical

https://github.com/siliconflow/onediff/blob/6b54c0872079dec578ca266b82fbb5f03f6b44b8/src/onediff/infer_compiler/utils/args_tree_util.py#L45