Closed yasuhiroitoTT closed 1 month ago
So this is actually working correctly. The issue you're facing here is that YOLO models have the last part of the model actually run on the host CPU. This is because the last section uses a ton of TMs that can't be run on device. In fact you can see here in repro.py
that the CPU fallback for TMs is explicitly enabled compiler_cfg.enable_tm_cpu_fallback = True
. This is of course required for the model to compile.
The TTI that gets generated is thus only going to be the part of the model that executes on device, and so the outputs of that will not be the final result you desire. You should have seen this warning when running python repro.py --save_tti
:
WARNING | pybuda.ttdevice:compile_to_image:1536 - CPU fallback devices are not supported when compiling to TTI image. Only TTDevice will be saved to TTI image. Loading the image will probably end in an error.
Thank you. @LPanosTT. I could understand what was going on during compilation, and yes I totally missed the warning message about CPU fallback is not supported.
From the user's standpoint, throwing exception or error in such case would be user-friendly for indicating output may not be expected shape. Generated TTI in this case is hard for users to use when it has unintended output shape, and it needs manual pipework for adding missing operation.
@LPanosTT How much effort is it to add / throw an error in this case?
@LPanosTT how do we test these models in CI? My understanding was that we first compile/save models with TTI then test them. Is this what we do for the YOLO models?
@milank94 Unless something has changed, no. By default the models compile the TT device model and the CPU fallback part and then execute one after the other without generating a TTI.
@yasuhiroitoTT I'm hesitant to throw an error because the "manual pipework" you're speaking of is actually already done. You will see in the generated_modules
folder in your project root that there is a file named something like yolo_tt0.py
and yolo_cpu1.py
. In the CPU file is the part of the model which must be run on cpu afterward. To be able to load and use it you'll want to compile with compiler_cfg.retain_tvm_python_files = True
and then create an instance of the CPU fallback model and run model.process_framework_parameters(...)
and pass the *.pt
file that you'll also see in the generated_modules
folder.
I suppose the warning is a bit misleading... It's not that CPU fallback devices are not supported when compiling to TTI, its just that the entire model will not end up in the TTI since not all of it can run on device.
@LPanosTT I looked into generated_module folder and found that the _cpu_1.py and _tt_0.py. And I could get expected tensor shape by calling connecting model in ***_cpu_1.py after model from tti file.
Thank you for providing details about what is going on with model compilation. To ensure accuracy, could you please confirm if this workaround is documented in the official documentation? If yes, I will get back to my customer with referring it.
By the way, this issue with YOLOv5 is a Grayskull-specific issue. With using the WH_B0 card, all ops could be fit on the device; therefore, no ops left behind.
@yasuhiroitoTT CPU fallback functionality is written here. Unfortunately, it does not specifically go over what to do if you want to save/load a TTI for a model that requires CPU fallback. It just explains how it is automatically done in a regular, non-compile to TTI run.
Let's close this issue, if we need a separate effort to update the docs please open a separate issue.
Describe the bug YoloV5 model returns different output tensor when loading the same model from tti image.
To Reproduce Steps to reproduce the behavior:
repro 2.py.zip
Expected behavior out tensor from tti should have the same shape of original cpu model. in this case (1,25200,85). Without going through tti image, when we load model directory from pytorch model instance the output shape was exactly same as the original one
System (please complete the following information):
Additional context Add any other context about the problem here.