to create a simple test (benchmark.py) to evaluate generation speed
to reduce unnecessary computations in UNet3DModel
to enable torch compile w/ cudagraphs in UNet3DModel
These changes do not (currently) produce significant speed improvements on eager execution. I find this odd and will attempt to profile the model later.
When compilation over UNet3DModel.forward is enabled with, e.g. python3 gradio_app.py 640 512, process_video() performance is improved by 10% (i.e. 300s video generation -> 270s) on a 3090.
This branch attempts:
benchmark.py
) to evaluate generation speedThese changes do not (currently) produce significant speed improvements on eager execution. I find this odd and will attempt to profile the model later.
When compilation over
UNet3DModel.forward
is enabled with, e.g.python3 gradio_app.py 640 512
,process_video()
performance is improved by 10% (i.e. 300s video generation -> 270s) on a 3090.