Feature request: saving/loading latents after sampling

Using the inbuilt save/load latent with vae tiling enabled results in the error:

Could not run 'aten::slow_conv3d_forward' with arguments from the 'CUDA' backend. This could be because the operator doesn't exist for this backend, or was omitted during the selective/custom build process (if using custom build). If you are a Facebook employee

And with vae tiling off:

Sizes of tensors must match except in dimension 2. Expected size 60 but got size 12 for tensor number 1 in the list.

Would be super useful to have a smaller GPU sampling videos while 30/4090 does the decoding and interpolation stuff, assuming this isn't a comfy limitation. Or even just saving a couple minutes of sampling because OOM in the final stretch

kijai / ComfyUI-CogVideoXWrapper

Feature request: saving/loading latents after sampling #64