liuliu / swift-diffusion

BSD 3-Clause "New" or "Revised" License
423 stars 33 forks source link

Why is there non deterministic output #35

Closed ghost closed 1 year ago

ghost commented 1 year ago

public func test_unet_on_small( ){

  let g = DynamicGraph()

  let unet = UNet(batchSize: 1, startWidth: 32, startHeight: 32)

   let  xIn = g.variable(.GPU(0), .NHWC(1, 32, 32, 4), of: Float16.self )
   let  c = g.variable(.GPU(0), .CHW(1, 77, 768), of:Float16.self)
   let  t = g.variable(.GPU(0), .NC(1, 320) , of:Float16.self)

   g.withNoGrad {
        unet.compile(inputs: xIn, t, c)

         g.openStore("/tmp/sd-v1.4.ckpt") {
              $0.read("unet", model: unet)
         }

        xIn.full(1)
        c.full(1)
        t.full(1)

        let out = unet(inputs: xIn, t, c)[0].as(of: Float16.self)  
        print(out[0,31,31,0])

        print(" ===== ")
        print(" ===== ")

        let out1 = unet(inputs: xIn, t, c)[0].as(of: Float16.self)  
         print(out1[0,31,31,0])

        print(" ===== ")
        print(" ===== ")

        let out2 = unet(inputs: xIn, t, c)[0].as(of: Float16.self)  
         print(out2[0,31,31,0])

        print(" ===== ")
        print(" ===== ")

    }

}

run the process 5-6 times, you should see atleast one where the output is something like

0.0
 ===== 
 ===== 
3.05e-05
 ===== 
 ===== 
0.0
 ===== 
ghost commented 1 year ago

Running with MPS

liuliu commented 1 year ago

I can take a look tomorrow. One way to help: You can use DynamicGraph.logLevel = .verbose to see output on each layer. This will help you to locate at which layer the divergence happen.

Also, it might be your input is too irregular and some internal normalization layer is not happy with that (try x.randn() instead, rather than x.full(1)).

Also, you can use debugPrint(out) to pretty print the tensor into terminal.

ghost commented 1 year ago

Thanks a lot! I tried random also, still its non deterministic

ghost commented 1 year ago

Also, this bug is in the NCHW branch

liuliu commented 1 year ago

You surely meant NHWC? Otherwise the shape doesn't make sense.

ghost commented 1 year ago

sorry i mean NHWC, but i can reproduce this in master also

ghost commented 1 year ago

https://github.com/brappier/swift-diffusion-bug-repro-1/commit/c108375d4209de6e35c881be8f7ff6e27b7ca69f

You can repro using this patch

Run the binary 5-6 times and you will see inconstancy

ghost commented 1 year ago

Any idea what could be causing this issue?