output tensor image has changed after model compilation

yasuhiroitoTT commented 2 months ago

Describe the bug YoloV5 model returns different output tensor when loading the same model from tti image.

To Reproduce Steps to reproduce the behavior:

install tt-buda v0.18.2 for grayskull
run attached reproducer by python3 repro.py --save_tti to make tti image
run again with no option (by python3 repro.py) , then you will see ouput tensor shape from tti image and from cpu like

out tensor from tti torch.Size([1, 255, 80, 80])  
out tensor from original cpu model torch.Size([1, 25200, 85])

repro 2.py.zip

Expected behavior out tensor from tti should have the same shape of original cpu model. in this case (1,25200,85). Without going through tti image, when we load model directory from pytorch model instance the output shape was exactly same as the original one

System (please complete the following information):

Device: Single E150
Host information: [e.g. Intel Xeon]
OS: Ubuntu 20.04LTS:
Version: tt-buda v0.18.2 for grayskull
Other relevant information: Python 3.8.10, firmware 80.10.0.0

Additional context Add any other context about the problem here.

LPanosTT commented 2 months ago

So this is actually working correctly. The issue you're facing here is that YOLO models have the last part of the model actually run on the host CPU. This is because the last section uses a ton of TMs that can't be run on device. In fact you can see here in repro.py that the CPU fallback for TMs is explicitly enabled compiler_cfg.enable_tm_cpu_fallback = True. This is of course required for the model to compile.

The TTI that gets generated is thus only going to be the part of the model that executes on device, and so the outputs of that will not be the final result you desire. You should have seen this warning when running python repro.py --save_tti:

WARNING  | pybuda.ttdevice:compile_to_image:1536 - CPU fallback devices are not supported when compiling to TTI image. Only TTDevice will be saved to TTI image. Loading the image will probably end in an error.

yasuhiroitoTT commented 2 months ago

Thank you. @LPanosTT. I could understand what was going on during compilation, and yes I totally missed the warning message about CPU fallback is not supported.

From the user's standpoint, throwing exception or error in such case would be user-friendly for indicating output may not be expected shape. Generated TTI in this case is hard for users to use when it has unintended output shape, and it needs manual pipework for adding missing operation.

staylorTT commented 2 months ago

@LPanosTT How much effort is it to add / throw an error in this case?

milank94 commented 2 months ago

@LPanosTT how do we test these models in CI? My understanding was that we first compile/save models with TTI then test them. Is this what we do for the YOLO models?

LPanosTT commented 2 months ago

@milank94 Unless something has changed, no. By default the models compile the TT device model and the CPU fallback part and then execute one after the other without generating a TTI.

@yasuhiroitoTT I'm hesitant to throw an error because the "manual pipework" you're speaking of is actually already done. You will see in the generated_modules folder in your project root that there is a file named something like yolo_tt0.py and yolo_cpu1.py. In the CPU file is the part of the model which must be run on cpu afterward. To be able to load and use it you'll want to compile with compiler_cfg.retain_tvm_python_files = True and then create an instance of the CPU fallback model and run model.process_framework_parameters(...) and pass the *.pt file that you'll also see in the generated_modules folder.

I suppose the warning is a bit misleading... It's not that CPU fallback devices are not supported when compiling to TTI, its just that the entire model will not end up in the TTI since not all of it can run on device.

yasuhiroitoTT commented 2 months ago

@LPanosTT I looked into generated_module folder and found that the _cpu_1.py and _tt_0.py. And I could get expected tensor shape by calling connecting model in ***_cpu_1.py after model from tti file.

Thank you for providing details about what is going on with model compilation. To ensure accuracy, could you please confirm if this workaround is documented in the official documentation? If yes, I will get back to my customer with referring it.

By the way, this issue with YOLOv5 is a Grayskull-specific issue. With using the WH_B0 card, all ops could be fit on the device; therefore, no ops left behind.

LPanosTT commented 2 months ago

@yasuhiroitoTT CPU fallback functionality is written here. Unfortunately, it does not specifically go over what to do if you want to save/load a TTI for a model that requires CPU fallback. It just explains how it is automatically done in a regular, non-compile to TTI run.

staylorTT commented 1 month ago

Let's close this issue, if we need a separate effort to update the docs please open a separate issue.

tenstorrent / tt-buda-demos

output tensor image has changed after model compilation #120