Compile reid models for Halo8L?

bveldhoen commented 1 month ago

Hello,

I'm running into questions when trying to get the reid example running on hailo8l. Any help will be greatly appreciated, thanks in advance.

I'm trying to get the reid example (https://github.com/hailo-ai/Hailo-Application-Code-Examples/blob/main/runtime/cpp/re_id/README.md) working for the Halo8L (using raspberry Pi 5 + AI kit). This example uses two models:

yolov5s_personface.hef
repvgg_a0_person_reid_2048.hef

These models seem to be available only for the Halo8, and not (yet?) for the Halo8L.

I've tried to compile these models for halo8l by using the training guides:

, while exporting directly from the provided pretrained .pt and .pth models (and skipping the custom training):

Question 1

From this link, it appears that the repvgg_a0_person_reid_512.onnx file is available (next to the provided .pth model):

https://github.com/hailo-ai/hailo_model_zoo/blob/master/docs/public_models/HAILO8L/HAILO8l_person_re_id.rst

a) However, the .onnx files for repvgg_a0_person_reid_2048 and yolov5s_personface don't seem to be available in the public_models. Is this correct? (or are they maybe available on another link?) b) Should the .pth or .onnx models be used as a starting point for export and compilation?

Question 2

Export yolov5s_personface (in docker container https://github.com/hailo-ai/hailo_model_zoo/blob/master/hailo_models/personface_detection/Dockerfile):

# python models/export.py --weights /local/hailo.ai/hailo_ai_sw_suite_2024-04_docker/shared_with_docker/yolov5s_personface.pt --img-size 640 --batch-size 1
Namespace(batch_size=1, img_size=[640, 640], weights='/local/hailo.ai/hailo_ai_sw_suite_2024-04_docker/shared_with_docker/yolov5s_personface.pt')

Starting TorchScript export with torch 1.7.1...
/opt/conda/lib/python3.8/site-packages/torch/jit/_trace.py:934: TracerWarning: Encountering a list at the output of the tracer might cause the trace to be incorrect, this is only valid if the container structure does not change based on the module's inputs. Consider using a constant container instead (e.g. for `list`, use a `tuple` instead. for `dict`, use a `NamedTuple` instead). If you absolutely need this and know the side effects, pass strict=False to trace() to allow this behavior.
  module._c._create_method_from_trace(
TorchScript export success, saved as /local/hailo.ai/hailo_ai_sw_suite_2024-04_docker/shared_with_docker/yolov5s_personface.torchscript.pt

Starting ONNX export with onnx 1.13.0...
Fusing layers... Model Summary: 140 layers, 7.24922e+06 parameters, 6.61683e+06 gradients
graph torch-jit-export (
  %images[FLOAT, 1x3x640x640]
) initializers (
  %435[FLOAT, 4]
  ...
  %model.0.conv.conv.bias[FLOAT, 32]
  ...
) {
  %167 = Constant[value = <Tensor>]()
  ...
  %431 = Unsqueeze[axes = [0]](%424)
  %432 = Concat[axis = 0](%427, %441, %442, %430, %431)
  %433 = Reshape(%415, %432)
  %434 = Transpose[perm = [0, 1, 3, 4, 2]](%433)
  return %output, %414, %434
}
ONNX export success, saved as /local/hailo.ai/hailo_ai_sw_suite_2024-04_docker/shared_with_docker/yolov5s_personface.onnx

Starting CoreML export with coremltools 6.1...
Model is not in eval mode. Consider calling '.eval()' on your model prior to conversion
Tuple detected at graph output. This will be flattened in the converted model.
Converting PyTorch Frontend ==> MIL Ops: 100%|
...
███████████████████████████████████████████████████████▏| 730/732 [00:00<00:00, 7487.83 ops/s]
Running MIL Common passes:   0%|                                                                                                                                                                                                                                                                                                                                | 0/39 [00:00<?, ? passes/s]/opt/conda/lib/python3.8/site-packages/coremltools/converters/mil/mil/passes/name_sanitization_utils.py:135: UserWarning: Output, '1241', of the source model, has been renamed to 'var_1241' in the Core ML model.
  warnings.warn(msg.format(var.name, new_name))
/opt/conda/lib/python3.8/site-packages/coremltools/converters/mil/mil/passes/name_sanitization_utils.py:135: UserWarning: Output, '1256', of the source model, has been renamed to 'var_1256' in the Core ML model.
  warnings.warn(msg.format(var.name, new_name))
/opt/conda/lib/python3.8/site-packages/coremltools/converters/mil/mil/passes/name_sanitization_utils.py:135: UserWarning: Output, '1271', of the source model, has been renamed to 'var_1271' in the Core ML model.
  warnings.warn(msg.format(var.name, new_name))
Running MIL Common passes: 100%|
...
██████████| 39/39 [00:00<00:00, 179.85 passes/s]
Running MIL Clean up passes: 100%|
...
████████| 11/11 [00:00<00:00, 251.67 passes/s]
Translating MIL ==> NeuralNetwork Ops: 100%|
...
██████████████████████████████████████████████████████████| 728/728 [00:00<00:00, 1870.86 ops/s]
CoreML export success, saved as /local/hailo.ai/hailo_ai_sw_suite_2024-04_docker/shared_with_docker/yolov5s_personface.mlmodel

Export complete. Visualize with https://github.com/lutzroeder/netron.

# polygraphy inspect model /local/hailo.ai/hailo_ai_sw_suite_2024-04_docker/shared_with_docker/yolov5s_personface.onnx
[W] 'colored' module is not installed, will not use colors when logging. To enable colors, please install the 'colored' module: python3 -m pip install colored
[I] Loading model: /local/hailo.ai/hailo_ai_sw_suite_2024-04_docker/shared_with_docker/yolov5s_personface.onnx
[I] ==== ONNX Model ====
    Name: torch-jit-export | ONNX Opset: 12

    ---- 1 Graph Input(s) ----
    {images [dtype=float32, shape=(1, 3, 640, 640)]}

    ---- 3 Graph Output(s) ----
    {output [dtype=float32, shape=(1, 3, 80, 80, 7)],
     414 [dtype=float32, shape=(1, 3, 40, 40, 7)],
     434 [dtype=float32, shape=(1, 3, 20, 20, 7)]}

    ---- 164 Initializer(s) ----

    ---- 250 Node(s) ----

Compile yolov5s_personface in the hailo ai sw suite docker container:

# hailomz compile --hw-arch hailo8l --ckpt /local/hailo.ai/hailo_ai_sw_suite_2024-04_docker/shared_with_docker/yolov5s_personface.onnx --calib-path /local/hailo.ai/hailo_ai_sw_suite_2024-04_docker/shared_with_docker/Market-1501-v15.09.15 --yaml /local/hailo.ai/hailo_model_zoo/hailo_model_zoo/cfg/networks/yolov5s_personface.yaml
<Hailo Model Zoo INFO> Start run for network yolov5s_personface ...
<Hailo Model Zoo INFO> Initializing the hailo8l runner...
<Hailo Model Zoo WARNING> Hailo8L support is currently at Preview on Hailo Model Zoo
[info] Translation started on ONNX model yolov5s_personface
[info] Restored ONNX model yolov5s_personface (completion time: 00:00:00.07)
[info] Extracted ONNXRuntime meta-data for Hailo model (completion time: 00:00:00.36)
[info] NMS structure of yolov5 (or equivalent architecture) was detected. Default values of NMS anchors were loaded to NMS config json
[info] Start nodes mapped from original model: 'images': 'yolov5s_personface/input_layer1'.
[info] End nodes mapped from original model: 'Conv_234', 'Conv_218', 'Conv_202'.
[info] Translation completed on ONNX model yolov5s_personface (completion time: 00:00:00.76)
[info] Saved HAR to: /local/workspace/yolov5s_personface.har
<Hailo Model Zoo INFO> Preparing calibration data...
[info] Loading model script commands to yolov5s_personface from /local/workspace/hailo_model_zoo/hailo_model_zoo/cfg/alls/generic/yolov5s_personface.alls
[info] Starting Model Optimization
[info] Using default optimization level of 2
[info] Model received quantization params from the hn
[info] Starting Mixed Precision
[info] Mixed Precision is done (completion time is 00:00:00.03)
[info] create_layer_norm skipped
[info] Starting Stats Collector
[info] Using dataset with 64 entries for calibration
Calibration: 100%|
...
█████████████████████████| 64/64 [00:15<00:00,  4.08entries/s]
[info] Stats Collector is done (completion time is 00:00:16.59)
[info] Bias Correction skipped
[info] Adaround skipped
[info] Starting Fine Tune
[warning] Dataset is larger than expected size. Increasing the algorithm dataset size might improve the results
[info] Using dataset with 4000 entries for finetune
Epoch 1/4
500/500 [==============================] - 233s 389ms/step - total_distill_loss: 0.1006 - _distill_loss_yolov5s_personface/conv70: 0.0323 - _distill_loss_yolov5s_personface/conv63: 0.0358 - _distill_loss_yolov5s_personface/conv55: 0.0325
...
Epoch 4/4
500/500 [==============================] - 196s 391ms/step - total_distill_loss: 0.0853 - _distill_loss_yolov5s_personface/conv70: 0.0259 - _distill_loss_yolov5s_personface/conv63: 0.0300 - _distill_loss_yolov5s_personface/conv55: 0.0294
[info] Fine Tune is done (completion time is 00:13:41.32)
[info] Starting Layer Noise Analysis
Full Quant Analysis: 100%|
...
████████████████| 2/2 [00:50<00:00, 25.19s/iterations]
[info] Layer Noise Analysis is done (completion time is 00:00:52.28)
[info] Model Optimization is done
[info] Saved HAR to: /local/workspace/yolov5s_personface.har
[info] Loading model script commands to yolov5s_personface from /local/workspace/hailo_model_zoo/hailo_model_zoo/cfg/alls/generic/yolov5s_personface.alls
[info] Adding an output layer after conv55
[info] Adding an output layer after conv63
[info] Adding an output layer after conv70
[info] Loading network parameters
[warning] Output order different size
[info] Starting Hailo allocation and compilation flow
[info] Finding the best partition to contexts...
Performance / Iteration
...
   230   231   232   233   234   235   236   237   238   239   240   241   242   243   244   245   246   247   248   249    
Iteration #253 - Contexts: 7 
[info] Using Multi-context flow
[info] Resources optimization guidelines: Strategy -> GREEDY Objective -> MAX_FPS
[info] Resources optimization params: max_control_utilization=60%, max_compute_utilization=60%, max_compute_16bit_utilization=60%, max_memory_utilization (weights)=60%, max_input_aligner_utilization=60%, max_apu_utilization=60%

Validating context_0 layer by layer (100%)
...
● Finished                                       

[info] Solving the allocation (Mapping), time per context: 59m 59s
Context:0/3 Iteration 4: Trying parallel mapping...  
          cluster_0  cluster_1  cluster_2  cluster_3  cluster_4  cluster_5  cluster_6  cluster_7  prepost 
 worker0  V          V          *          *          V          V          *          *          V       
 worker1  V          V          *          *          V          V          *          *          V       
 worker2  V          V          *          *          V          V          *          *          V       
 worker3  V          V          *          *          V          V          *          *          V       
Context:1/3 Iteration 4: Trying parallel mapping...  
...
  00:23
Reverts on cluster mapping: 0
Reverts on inter-cluster connectivity: 0
Reverts on pre-mapping validation: 0
Reverts on split failed: 0

[info] context_0 (context_0):
Iterations: 4
Reverts on cluster mapping: 0
Reverts on inter-cluster connectivity: 0
Reverts on pre-mapping validation: 0
Reverts on split failed: 0
[info] context_1 (context_1):
Iterations: 4
Reverts on cluster mapping: 0
Reverts on inter-cluster connectivity: 0
Reverts on pre-mapping validation: 0
Reverts on split failed: 0
[info] context_2 (context_2):
Iterations: 4
Reverts on cluster mapping: 0
Reverts on inter-cluster connectivity: 0
Reverts on pre-mapping validation: 0
Reverts on split failed: 0
[info] context_3 (context_3):
Iterations: 4
Reverts on cluster mapping: 0
Reverts on inter-cluster connectivity: 0
Reverts on pre-mapping validation: 0
Reverts on split failed: 0
[info] context_0 utilization: 
[info] +-----------+---------------------+---------------------+--------------------+
[info] | Cluster   | Control Utilization | Compute Utilization | Memory Utilization |
[info] +-----------+---------------------+---------------------+--------------------+
[info] | cluster_0 | 75%                 | 82.8%               | 55.5%              |
[info] | cluster_1 | 68.8%               | 28.1%               | 31.3%              |
[info] | cluster_4 | 68.8%               | 93.8%               | 57%                |
[info] | cluster_5 | 18.8%               | 6.3%                | 8.6%               |
[info] +-----------+---------------------+---------------------+--------------------+
[info] | Total     | 57.8%               | 52.7%               | 38.1%              |
[info] +-----------+---------------------+---------------------+--------------------+
[info] context_1 utilization: 
[info] +-----------+---------------------+---------------------+--------------------+
[info] | Cluster   | Control Utilization | Compute Utilization | Memory Utilization |
[info] +-----------+---------------------+---------------------+--------------------+
[info] | cluster_0 | 100%                | 57.8%               | 72.7%              |
[info] | cluster_1 | 25%                 | 10.9%               | 14.8%              |
[info] | cluster_4 | 25%                 | 20.3%               | 14.1%              |
[info] | cluster_5 | 93.8%               | 60.9%               | 47.7%              |
[info] +-----------+---------------------+---------------------+--------------------+
[info] | Total     | 60.9%               | 37.5%               | 37.3%              |
[info] +-----------+---------------------+---------------------+--------------------+
[info] context_2 utilization: 
[info] +-----------+---------------------+---------------------+--------------------+
[info] | Cluster   | Control Utilization | Compute Utilization | Memory Utilization |
[info] +-----------+---------------------+---------------------+--------------------+
[info] | cluster_0 | 100%                | 79.7%               | 66.4%              |
[info] | cluster_4 | 100%                | 57.8%               | 48.4%              |
[info] | cluster_5 | 50%                 | 26.6%               | 32.8%              |
[info] +-----------+---------------------+---------------------+--------------------+
[info] | Total     | 62.5%               | 41%                 | 36.9%              |
[info] +-----------+---------------------+---------------------+--------------------+
[info] context_3 utilization: 
[info] +-----------+---------------------+---------------------+--------------------+
[info] | Cluster   | Control Utilization | Compute Utilization | Memory Utilization |
[info] +-----------+---------------------+---------------------+--------------------+
[info] | cluster_0 | 75%                 | 81.3%               | 95.3%              |
[info] | cluster_4 | 81.3%               | 82.8%               | 54.7%              |
[info] | cluster_5 | 62.5%               | 59.4%               | 43.8%              |
[info] +-----------+---------------------+---------------------+--------------------+
[info] | Total     | 54.7%               | 55.9%               | 48.4%              |
[info] +-----------+---------------------+---------------------+--------------------+
[info] Successful Mapping (allocation time: 7m 15s)
[info] Compiling context_0...
[info] Compiling context_1...
[info] Compiling context_2...
[info] Compiling context_3...
[info] Bandwidth of model inputs: 9.375 Mbps, outputs: 1.34583 Mbps (for a single frame)
[info] Bandwidth of DDR buffers: 0.0 Mbps (for a single frame)
[info] Bandwidth of inter context tensors: 45.3125 Mbps (for a single frame)
[info] Compiling context_0...
[info] Compiling context_1...
[info] Compiling context_2...
[info] Compiling context_3...
[info] Bandwidth of model inputs: 9.375 Mbps, outputs: 1.34583 Mbps (for a single frame)
[info] Bandwidth of DDR buffers: 0.0 Mbps (for a single frame)
[info] Bandwidth of inter context tensors: 45.3125 Mbps (for a single frame)
[info] Building HEF...
[info] Successful Compilation (compilation time: 5s)
[info] Saved HAR to: /local/workspace/yolov5s_personface.har
<Hailo Model Zoo INFO> HEF file written to yolov5s_personface.hef

# hailortcli parse-hef yolov5s_personface.hef
Architecture HEF was compiled for: HAILO8L
Network group name: yolov5s_personface, Multi Context - Number of contexts: 4
    Network name: yolov5s_personface/yolov5s_personface
        VStream infos:
            Input  yolov5s_personface/input_layer1 UINT8, NHWC(640x640x3)
            Output yolov5s_personface/yolov5_nms_postprocess FLOAT32, HAILO NMS(number of classes: 2, maximum bounding boxes per class: 80, maximum frame size: 3208)
            Operation:
                Op YOLOV5
                Name: YOLOv5-Post-Process
                Score threshold: 0.200
                IoU threshold: 0.60
                Classes: 2
                Cross classes: false
                Max bboxes per class: 80
                Image height: 640
                Image width: 640

When parsing the yolov5s_personface.hef file of the precompiled hailo8 (not hailo8l) model:

# hailortcli parse-hef yolov5s_personface.hef
Architecture HEF was compiled for: HAILO8
Network group name: yolov5s_personface, Single Context
    Network name: yolov5s_personface/yolov5s_personface
        VStream infos:
            Input  yolov5s_personface/input_layer1 UINT8, NHWC(640x640x3)
            Output yolov5s_personface/conv70 UINT8, FCR(20x20x21)
            Output yolov5s_personface/conv63 UINT8, FCR(40x40x21)
            Output yolov5s_personface/conv55 UINT8, FCR(80x80x21)

Notable differences:

The hailo8l compiled model is multi-context (vs. single context for hailo8)
The hailo8l compiled model has only 1 output of type FLOAT32 (vs. 3 outputs of type UINT8 for hailo8)

It seems to be related to using nms postprocessing, but I could not find more information about it (i.e. on how to disable it).

a) Is it possible to export and compile the yolov5s_personface.pt or .onnx model to make it meet the expectations of the reid example? b) If so, how / which command line arguments should be used? c) I've also tried compiling some of the other available personface models (such as yolov5s_personface_nv12_fhd), i.e.:

hailomz compile --hw-arch hailo8l --calib-path /local/shared_with_docker/Market-1501-v15.09.15 --resize 640 640 --input-conversion nv12_to_rgb yolov5s_personface_nv12_fhd

without any luck (although for the yolov5s_personface_nv12_fhd model, the output seemed to be with 3 UINT8 layers). d) Are there plans to make the hailo8l precompiled model for yolov5s_personface available for download?

Question 3

Export repvgg_a0_person_reid_512 (in docker container https://github.com/hailo-ai/hailo_model_zoo/blob/master/hailo_models/reid/Dockerfile):

# python scripts/export.py --model_name repvgg_a0_512 --weights /local/shared_with_docker/repvgg_a0_person_reid_512.pth
RepVGG Block, identity =  None
RepVGG Block, identity =  None
RepVGG Block, identity =  BatchNorm2d(48, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
...
RepVGG Block, identity =  None
Downloading...
From: https://drive.google.com/uc?id=13Gn8rq1PztoMEgK7rCOPMUYHjGzk-w11
To: /workspace/deep-person-reid/models/RepVGG-A0-train.pth
100%|
...
██████████████████████████████████████| 36.6M/36.6M [00:02<00:00, 16.6MB/s]
Successfully loaded pretrained weights from "/local/shared_with_docker/repvgg_a0_person_reid_512.pth"
** The following layers are discarded due to unmatched keys or layer size: ['classifier.weight', 'classifier.bias']
torch.Size([1, 512])

# POLYGRAPHY_AUTOINSTALL_DEPS=1 polygraphy inspect model model.onnx
[W] 'colored' module is not installed, will not use colors when logging. To enable colors, please install the 'colored' module: python3 -m pip install colored
[I] Loading model: /workspace/deep-person-reid/model.onnx
[I] Module: 'onnx' is required, but not installed. Attempting to install now.
[I] Running installation command: /opt/conda/bin/python -m pip install onnx>=1.8.1
[I] ==== ONNX Model ====
    Name: torch-jit-export | ONNX Opset: 9

    ---- 1 Graph Input(s) ----
    {test_input [dtype=float32, shape=(1, 3, 256, 128)]}

    ---- 1 Graph Output(s) ----
    {test_output [dtype=float32, shape=(1, 512)]}

    ---- 47 Initializer(s) ----

    ---- 52 Node(s) ----

(moved model.onnx to /local/hailo.ai/hailo_ai_sw_suite_2024-04_docker/shared_with_docker/repvgg_a0_person_reid_512.onnx)

Compile repvgg_a0_person_reid_512 in the hailo ai sw suite docker container:

hailomz compile --hw-arch hailo8l --ckpt /local/hailo.ai/hailo_ai_sw_suite_2024-04_docker/shared_with_docker/repvgg_a0_person_reid_512.onnx --calib-path /local/hailo.ai/hailo_ai_sw_suite_2024-04_docker/shared_with_docker/Market-1501-v15.09.15 --yaml /local/hailo.ai/hailo_model_zoo/hailo_model_zoo/cfg/networks/repvgg_a0_person_reid_512.yaml
<Hailo Model Zoo INFO> Start run for network repvgg_a0_person_reid_512 ...
<Hailo Model Zoo INFO> Initializing the hailo8l runner...
<Hailo Model Zoo WARNING> Hailo8L support is currently at Preview on Hailo Model Zoo
[info] Translation started on ONNX model repvgg_a0_person_reid_512
[info] Restored ONNX model repvgg_a0_person_reid_512 (completion time: 00:00:00.08)
[info] Extracted ONNXRuntime meta-data for Hailo model (completion time: 00:00:00.30)
[info] Start nodes mapped from original model: 'test_input': 'repvgg_a0_person_reid_512/input_layer1'.
[info] End nodes mapped from original model: 'Gemm_51'.
[info] Translation completed on ONNX model repvgg_a0_person_reid_512 (completion time: 00:00:00.40)
[info] Saved HAR to: /local/workspace/repvgg_a0_person_reid_512.har
<Hailo Model Zoo INFO> Preparing calibration data...
[info] Loading model script commands to repvgg_a0_person_reid_512 from /local/workspace/hailo_model_zoo/hailo_model_zoo/cfg/alls/generic/repvgg_a0_person_reid_512.alls
[info] Starting Model Optimization
[info] Using default optimization level of 2
[info] Model received quantization params from the hn
[info] Starting Mixed Precision
[info] Mixed Precision is done (completion time is 00:00:00.01)
[info] create_layer_norm skipped
[info] Starting Stats Collector
[info] Using dataset with 64 entries for calibration
Calibration: 100%|
...
█████████████████████████| 64/64 [00:04<00:00, 14.06entries/s]
[info] Stats Collector is done (completion time is 00:00:04.80)
[info] Bias Correction skipped
[info] Adaround skipped
[info] Starting Fine Tune
[warning] Dataset is larger than expected size. Increasing the algorithm dataset size might improve the results
[info] Using dataset with 8000 entries for finetune
Epoch 1/8
1000/1000 [==============================] - 41s 29ms/step - total_distill_loss: 0.1737 - _distill_loss_repvgg_a0_person_reid_512/fc1: 0.1737
...
Epoch 8/8
1000/1000 [==============================] - 29s 29ms/step - total_distill_loss: 0.1046 - _distill_loss_repvgg_a0_person_reid_512/fc1: 0.1046
[info] Fine Tune is done (completion time is 00:04:05.98)
[info] Starting Layer Noise Analysis
Full Quant Analysis: 100%|
...
████████████████| 2/2 [00:12<00:00,  6.15s/iterations]
[info] Layer Noise Analysis is done (completion time is 00:00:12.83)
[info] Output layers signal-to-noise ratio (SNR): measures the quantization noise (higher is better)
[info]  repvgg_a0_person_reid_512/output_layer1 SNR:    1.813 dB
[info] Model Optimization is done
[info] Saved HAR to: /local/workspace/repvgg_a0_person_reid_512.har
[info] Loading model script commands to repvgg_a0_person_reid_512 from /local/workspace/hailo_model_zoo/hailo_model_zoo/cfg/alls/generic/repvgg_a0_person_reid_512.alls
[info] Loading network parameters
[info] Starting Hailo allocation and compilation flow
[info] Using Single-context flow
[info] Resources optimization guidelines: Strategy -> GREEDY Objective -> MAX_FPS
[info] Resources optimization params: max_control_utilization=75%, max_compute_utilization=75%, max_compute_16bit_utilization=75%, max_memory_utilization (weights)=75%, max_input_aligner_utilization=75%, max_apu_utilization=75%
[info] Using Single-context flow
[info] Resources optimization guidelines: Strategy -> GREEDY Objective -> MAX_FPS
[info] Resources optimization params: max_control_utilization=75%, max_compute_utilization=75%, max_compute_16bit_utilization=75%, max_memory_utilization (weights)=75%, max_input_aligner_utilization=75%, max_apu_utilization=75%

Validating context_0 layer by layer (100%)
...
● Finished                                           

[info] Solving the allocation (Mapping), time per context: 59m 59s
Context:0/0 Iteration 4: Trying parallel mapping...  
          cluster_0  cluster_1  cluster_2  cluster_3  cluster_4  cluster_5  cluster_6  cluster_7  prepost 
 worker0  V          V          *          *          V          V          *          *          V       
 worker1  V          V          *          *          V          V          *          *          V       
 worker2  V          V          *          *          V          V          *          *          V       
 worker3  V          V          *          *          V          V          *          *          V       

  00:23
Reverts on cluster mapping: 0
Reverts on inter-cluster connectivity: 0
Reverts on pre-mapping validation: 0
Reverts on split failed: 0

[info] Iterations: 4
Reverts on cluster mapping: 1
Reverts on inter-cluster connectivity: 0
Reverts on pre-mapping validation: 0
Reverts on split failed: 0
[info] +-----------+---------------------+---------------------+--------------------+
[info] | Cluster   | Control Utilization | Compute Utilization | Memory Utilization |
[info] +-----------+---------------------+---------------------+--------------------+
[info] | cluster_0 | 62.5%               | 71.9%               | 87.5%              |
[info] | cluster_1 | 87.5%               | 62.5%               | 39.1%              |
[info] | cluster_4 | 68.8%               | 70.3%               | 77.3%              |
[info] | cluster_5 | 56.3%               | 95.3%               | 50.8%              |
[info] +-----------+---------------------+---------------------+--------------------+
[info] | Total     | 68.8%               | 75%                 | 63.7%              |
[info] +-----------+---------------------+---------------------+--------------------+
[info] Successful Mapping (allocation time: 39s)
[info] Compiling context_0...
[info] Bandwidth of model inputs: 0.75 Mbps, outputs: 0.00390625 Mbps (for a single frame)
[info] Bandwidth of DDR buffers: 0.0 Mbps (for a single frame)
[info] Bandwidth of inter context tensors: 0.0 Mbps (for a single frame)
[info] Compiling context_0...
[info] Bandwidth of model inputs: 0.75 Mbps, outputs: 0.00390625 Mbps (for a single frame)
[info] Bandwidth of DDR buffers: 0.0 Mbps (for a single frame)
[info] Bandwidth of inter context tensors: 0.0 Mbps (for a single frame)
[info] Building HEF...
[info] Successful Compilation (compilation time: 1s)
[info] Saved HAR to: /local/workspace/repvgg_a0_person_reid_512.har
<Hailo Model Zoo INFO> HEF file written to repvgg_a0_person_reid_512.hef

# hailortcli parse-hef repvgg_a0_person_reid_512.hef
Architecture HEF was compiled for: HAILO8L
Network group name: repvgg_a0_person_reid_512, Single Context
    Network name: repvgg_a0_person_reid_512/repvgg_a0_person_reid_512
        VStream infos:
            Input  repvgg_a0_person_reid_512/input_layer1 UINT8, NHWC(256x128x3)
            Output repvgg_a0_person_reid_512/fc1 UINT8, NC(512)

When parsing the repvgg_a0_person_reid_2048.hef file of the precompiled hailo8 (not hailo8l) model:

# hailortcli parse-hef repvgg_a0_person_reid_2048.hef
Architecture HEF was compiled for: HAILO8
Network group name: repvgg_a0_person_reid_2048, Single Context
    Network name: repvgg_a0_person_reid_2048/repvgg_a0_person_reid_2048
        VStream infos:
            Input  repvgg_a0_person_reid_2048/input_layer1 UINT8, NHWC(256x128x3)
            Output repvgg_a0_person_reid_2048/fc1 UINT8, NC(2048)

The export and compilation of repvgg_a0_person_reid_512 for hailo8l seems to have been successful.

a) I tried the same for repvgg_a0_person_reid_2048, but failed due to the fact that hailo_model_zoo/cfg/networks/repvgg_a0_person_reid_2048.yaml does not exist. Is this correct, or can be be found somewhere else?

omerwer commented 1 month ago

Hi @bveldhoen, To answer your questions:

a) Yes, you are correct. The repvgg_a0_person_reid_2048 and yolov5s_personface models have not been formally tested for Hillo8L and therefore don't appear in the public_models page. With that saying, it may be supported for Hailo8L regardless, we just can't assure behavior\success for these models. b) To start the Dataflow Compiler's pipeline, you should have a pretrained (with Batch Normalization) ONNX or tflite model.
The example is pretty old and the models used in it are not aligned with the newest models compiled. This is why you see different number of outputs for the model - the newer version uses the Hailo-NMS (which perform bbox-decoding + NMS filter for detection models using the HailoRT SW), so it has one output, while the older version doesn't and therefore have the same number of output as defined in the Parsing step's end nodes. The Hailo8L have less resources available compared to the Hailo8, so it's very likely that the Hailo8L will be compiled to multi-context while the Hailo8 will compile to a single context for the same given ONNX. Regarding the compilation for using the re-id example - it's possible to compile the model without the Hailo-NMS (not use the recommendation in the parsing process, delete the relevant command from the model's alls), but the overall performance might not be as good for that model. In any case, we plan to update all the older example we have in the repo to be support the new versions of the models in the near future.
Again, for the same reason, the models are different because the one from the Hailo Application git is an older version. The repvgg_a0_person_reid_2048 doesn't exists in the Hailo Model Zoo anymore and it have been replaced by repvgg_a0_person_reid_512. So in the current MZ tag version, you won't find repvgg_a0_person_reid_2048.

Regards,

bveldhoen commented 1 month ago

Hi @omerwer,

Thanks for your reply! We will evaluate the models on the Hailo8 (instead of trying to build them for Hailo8L).

hailo-ai / Hailo-Application-Code-Examples

Compile reid models for Halo8L? #280