attempts to improve speed

This branch attempts:

to create a simple test (benchmark.py) to evaluate generation speed
to reduce unnecessary computations in UNet3DModel
to enable torch compile w/ cudagraphs in UNet3DModel

These changes do not (currently) produce significant speed improvements on eager execution. I find this odd and will attempt to profile the model later.

When compilation over UNet3DModel.forward is enabled with, e.g. python3 gradio_app.py 640 512, process_video() performance is improved by 10% (i.e. 300s video generation -> 270s) on a 3090.

lllyasviel / Paints-UNDO

attempts to improve speed #35