liuliu / swift-diffusion

BSD 3-Clause "New" or "Revised" License
423 stars 33 forks source link

Recompiling the model but not having to call graph.openStore again. #30

Closed ghost closed 1 year ago

ghost commented 1 year ago

Right now it seems that I have to call graph.openStore every time i recompile the model with a new input shape. Is there any way that i recompile the model with a new input shape, but it retains the weights?

liuliu commented 1 year ago

It is probably due to the hard-code width / height during the model construction. If you use ModelBuilder, I don't think that will be an issue. You just supply new inputs, and it will create new model in a way retains the weights. Anyway, some code snippet would help to understand what you are doing.

ghost commented 1 year ago

unet = UNet(batchSize: 2, startWidth: 64, startHeight: 64 )
unet.compile(inputs: xIn, t, c)
graph.openStore("sd-v1.4.ckpt") {  $0.read("unet", model: unet!)  }

// ... some image generations 

unet = UNet(batchSize: 2, startWidth: 128, startHeight: 128 )
unet.compile(inputs: xIn2, t, c)
graph.openStore("sd-v1.4.ckpt") {  $0.read("unet", model: unet!)  } // i really dont wanna load everytime i change model size, as that wastes time

/// more image generations 
ghost commented 1 year ago

@liuliu do you have a snippet of how to do it with ModelBuilder?

liuliu commented 1 year ago

Yeah, I give the example earlier: https://github.com/liuliu/s4nnc/blob/main/examples/imdb/main.swift#L86

Basically you would probably do:

let actualUnet = ModelBuilder { inputs in
  let batchSize = inputs[0].shape[0]
  let startHeight = inputs[0].shape[2] // assuming NCHW
  let startWidth = inputs[0].shape[3]
  return UNet(batchSize: batchSize, startWidth: startWidth, startHeight: startHeight)
}

That has been said, I mostly use this for DRL, haven't tried for SD model yet. If this works as expected, you don't need to call unet.compile(inputs: xIn2, t, c), just pass in different size for next generation and it will reallocate intermediate tensors (if needed) and retain weights. The only thing is, it will ever expand the allocation (thus, if later you have xIn3 which is smaller than xIn2, it will not release RAM and will retain all RAM allocations of previously largest input).

ghost commented 1 year ago

Does the model builder call the UNet function every time the shape is changed, or does it call the UNet function only once, and it traces the graph in some fancy way and just resizing the tensors whenever the input size is changed?

So xIN3 is small again, and i wanna free up the memory i should recreate actualUnet and it will reset it and GC the old stuff?

liuliu commented 1 year ago

Does the model builder call the UNet function every time the shape is changed, or does it call the UNet function only once, and it traces the graph in some fancy way and just resizing the tensors whenever the input size is changed?

It is not a straightforward answer. Yes, every time shape changed, there will be a call to UNet function to build a model. But what happens is the more expensive optimization / tensor allocation pass most likely will be bypassed if the new tensor is smaller: https://github.com/liuliu/ccv/blob/unstable/lib/nnc/ccv_cnnp_model.c#L538

So xIN3 is small again, and i wanna free up the memory i should recreate actualUnet and it will reset it and GC the old stuff?

You most likely need to recreate the ModelBuilder again (ModelBuilder is a drop-in replacement for Model) and reload the weights.

ghost commented 1 year ago

This model builder thing seems to take wayy too much memory? is that expected? i create unet with model builder, and even after calling it for the first time it gives this error:

Error: command buffer exited with error status.
    The Metal Performance Shaders operations encoded on it may not have completed.
    Error: 
    (null)
    Insufficient Memory (00000008:kIOGPUCommandBufferCallbackErrorOutOfMemory)
    <AGXG13GFamilyCommandBuffer: 0x13dfe5220>
    label = <none> 
    device = <AGXG13GDevice: 0x13e875200>
        name = Apple M1 
    commandQueue = <AGXG13GFamilyCommandQueue: 0x13e89fa00>
        label = <none> 
        device = <AGXG13GDevice: 0x13e875200>
            name = Apple M1 
    retainedReferences = 1
Error: command buffer exited with error status.
liuliu commented 1 year ago

Not sure about this particular case, but with CUDA, I verified it worked: https://github.com/liuliu/swift-diffusion/blob/main/examples/txt2img/main.swift#L143 (Note that I added a explicit compile(inputs: function to ModelBuilder, without it, it will delay parameter loading to the first evaluation, may or may not be ideal).

ghost commented 1 year ago

Also, why are you not compling the model here https://github.com/liuliu/swift-diffusion/blob/main/examples/unet/main.swift?

liuliu commented 1 year ago

These has PythonKit dependency and I want to keep txt2img example clean of PythonKit dependency.

ghost commented 1 year ago

PythonKit is just used for running the pytorch model and loading the weights from the .ckpt file. still .compile is not being called. The fwd pass is run without calling the .compile. but if we do that in txt2img, it does not work. why?

liuliu commented 1 year ago

What error do you get? There might be a few reasons, but most likely just because when copying from Python, we copy Float32 weights, and if you use the code from examples/txt2img/main.swift all weights are in Float16, and you might error out on reading weights. That's why I first save the weights and then use it (when loading weights from SQLite, we will do transparent conversion between FP16 and FP32).