This should fail. If you open models/experimental/functional_unet/tests/test_unet_upblock.py to disable program cache in test_unet_upblock, it should pass.
Investigation
Looking at the PCC of each individual layer in Upblock4 shows the PCC drops off after the second convolution.
Turning on Watcher will detect a bad NOC transaction on the second model iteration, but only when program cache is enabled.
Looking at the reader indices config buffer before launching conv2 of upblock4 shows garbage values. It seems like some op is writing into this L1 small space.
Note that you may need to disable the bias to enable watcher because the convolution code size is too large.
This issue surfaces when I modified how we send inputs to device, previously we would first go from host to L1 interleaved, and then the first convolution would shard it, but now we go directly from host to L1 height sharded.
Summary
UNet Shallow gives bad PCC after two iterations when program cache is enabled.
At a high level, we can reproduce this issue by doing the following steps:
This PCC between the model and the reference model will be bad. The PCC become good (0.99) when program cache is disabled.
Reproducing the error
Checkout and build 14cf8346e42c0f6b67b8b38860764f68d5a3bce8 on N150/N300. Enable 8x8 grid if using N300.
Run the following command:
This should fail. If you open
models/experimental/functional_unet/tests/test_unet_upblock.py
to disable program cache intest_unet_upblock
, it should pass.Investigation
Looking at the PCC of each individual layer in Upblock4 shows the PCC drops off after the second convolution.
Turning on Watcher will detect a bad NOC transaction on the second model iteration, but only when program cache is enabled.
Note that you may need to disable the bias to enable watcher because the convolution code size is too large.
This issue surfaces when I modified how we send inputs to device, previously we would first go from host to L1 interleaved, and then the first convolution would shard it, but now we go directly from host to L1 height sharded.