hailo-ai / hailo_model_zoo

The Hailo Model Zoo includes pre-trained models and a full building and evaluation environment
MIT License
275 stars 41 forks source link

Error compile yolov8 to hef #111

Open Will-UEA opened 3 months ago

Will-UEA commented 3 months ago

I'm trying to compile a custom model trained with YOLOv8s so I can use it on the Raspberry Pi 5. But when it gets to "Starting Layer Noise Analysis," it throws an error. Any idea what could be wrong? I've tried searching but couldn't find anything specific.

hailo_model_optimization.acceleras.utils.acceleras_exceptions.SubprocessTracebackFailure: Subprocess failed with traceback Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-packages/hailo_model_optimization/tools/subprocess_wrapper.py", line 73, in child_wrapper func(self, *args, kwargs) File "/usr/local/lib/python3.10/dist-packages/hailo_model_optimization/flows/optimization_flow.py", line 347, in step3 self.finalize_optimization() File "/usr/local/lib/python3.10/dist-packages/hailo_model_optimization/tools/orchestator.py", line 250, in wrapped result = method(*args, *kwargs) File "/usr/local/lib/python3.10/dist-packages/hailo_model_optimization/flows/optimization_flow.py", line 405, in finalize_optimization self._noise_analysis() File "/usr/local/lib/python3.10/dist-packages/hailo_model_optimization/tools/orchestator.py", line 250, in wrapped result = method(args, kwargs) File "/usr/local/lib/python3.10/dist-packages/hailo_model_optimization/flows/optimization_flow.py", line 585, in _noise_analysis algo.run() File "/usr/local/lib/python3.10/dist-packages/hailo_model_optimization/algorithms/optimization_algorithm.py", line 50, in run return super().run() File "/usr/local/lib/python3.10/dist-packages/hailo_model_optimization/algorithms/algorithm_base.py", line 151, in run self._run_int() File "/usr/local/lib/python3.10/dist-packages/hailo_model_optimization/algorithms/hailo_layer_noise_analysis.py", line 83, in _run_int self.analyze_full_quant_net() File "/usr/local/lib/python3.10/dist-packages/hailo_model_optimization/algorithms/hailo_layer_noise_analysis.py", line 197, in analyze_full_quant_net lat_model.predict_on_batch(inputs) File "/usr/local/lib/python3.10/dist-packages/keras/engine/training.py", line 2603, in predict_on_batch outputs = self.predict_function(iterator) File "/usr/local/lib/python3.10/dist-packages/tensorflow/python/util/traceback_utils.py", line 153, in error_handler raise e.with_traceback(filtered_tb) from None File "/tmp/autograph_generated_filez7ousod7.py", line 15, in tfpredictfunction retval = ag.converted_call(ag__.ld(step_function), (ag.ld(self), ag.ld(iterator)), None, fscope) File "/usr/local/lib/python3.10/dist-packages/keras/engine/training.py", line 2155, in step_function outputs = model.distribute_strategy.run(run_step, args=(data,)) File "/usr/local/lib/python3.10/dist-packages/keras/engine/training.py", line 2143, in run_step outputs = model.predict_step(data) File "/usr/local/lib/python3.10/dist-packages/keras/engine/training.py", line 2111, in predict_step return self(x, training=False) File "/usr/local/lib/python3.10/dist-packages/keras/utils/traceback_utils.py", line 70, in error_handler raise e.with_traceback(filtered_tb) from None File "/tmp/autograph_generated_file6wtrfjh0.py", line 188, in tfcall ag.for_stmt(ag.converted_call(ag.ld(self)._model.flow.toposort, (), None, fscope), None, loop_body_5, get_state_9, set_state_9, (), {'iterate_names': 'lname'}) File "/tmp/autograph_generated_file6wtrfjh0.py", line 167, in loop_body_5 ag__.if_stmt(ag.not_(continue1), if_body_3, else_body_3, get_state_8, set_state_8, (), 0) File "/tmp/autograph_generated_file6wtrfjh0.py", line 94, in if_body_3 n_ancestors = ag.converted_call(ag.ld(self)._native_model.flow.ancestors, (ag.ld(lname),), None, fscope) File "/tmp/autograph_generated_fileh91llgie.py", line 12, in tfancestors retval_ = ag.converted_call(ag.ld(nx).ancestors, (ag.ld(self), ag.ld(source)), None, fscope) TypeError: in user code: File "/usr/local/lib/python3.10/dist-packages/keras/engine/training.py", line 2169, in predict_function * return step_function(self, iterator) File "/usr/local/lib/python3.10/dist-packages/keras/engine/training.py", line 2155, in step_function outputs = model.distribute_strategy.run(run_step, args=(data,)) File "/usr/local/lib/python3.10/dist-packages/keras/engine/training.py", line 2143, in run_step outputs = model.predict_step(data) File "/usr/local/lib/python3.10/dist-packages/keras/engine/training.py", line 2111, in predict_step return self(x, training=False) File "/usr/local/lib/python3.10/dist-packages/keras/utils/traceback_utils.py", line 70, in error_handler raise e.with_traceback(filtered_tb) from None File "/tmp/autograph_generated_file6wtrfjh0.py", line 188, in tfcall ag.for_stmt(ag.converted_call(ag.ld(self)._model.flow.toposort, (), None, fscope), None, loop_body_5, get_state_9, set_state_9, (), {'iterate_names': 'lname'}) File "/tmp/autograph_generated_file6wtrfjh0.py", line 167, in loop_body_5 ag__.if_stmt(ag.not_(continue1), if_body_3, else_body_3, get_state_8, set_state_8, (), 0) File "/tmp/autograph_generated_file6wtrfjh0.py", line 94, in if_body_3 n_ancestors = ag.converted_call(ag.ld(self)._native_model.flow.ancestors, (ag.ld(lname),), None, fscope) File "/tmp/autograph_generated_fileh91llgie.py", line 12, in tfancestors retval_ = ag.converted_call(ag.ld(nx).ancestors, (ag.ld(self), ag.ld(source)), None, fscope) TypeError: Exception encountered when calling layer 'lat_model' (type LATModel). in user code: File "/usr/local/lib/python3.10/dist-packages/hailo_model_optimization/algorithms/lat_utils/lat_model.py", line 340, in call n_ancestors = self._native_model.flow.ancestors(lname) File "/usr/local/lib/python3.10/dist-packages/hailo_model_optimization/acceleras/model/hailo_model/model_flow.py", line 31, in ancestors return nx.ancestors(self, source) TypeError: outer_factory..inner_factory..tffunc() missing 1 required keyword-only argument: '__wrapper' Call arguments received by layer 'lat_model' (type LATModel): • inputs=tf.Tensor(shape=(8, 640, 640, 3), dtype=float32)

omerwer commented 3 months ago

Hi @Will-UEA, It's difficult to know what's the issue without examining the model and the command you ran. First, try to run the optimization process with optimization level of 0 (you can disable the GPU by adding CUDA_VISIBLE_DEVICES=999 before the command). In either case, if you can please open a ticket in out ticketing system in the Hailo website with the relevant info + files (the ONNX you used, for example), or contact me via email at omerw@hailo.ai with the relevant info.

Regards,

nadaved1 commented 3 months ago

Please consult on the community forum

בתאריך יום ה׳, 18 ביולי 2024, 08:42, מאת omerwer @.***

:

Hi @Will-UEA https://github.com/Will-UEA, It's difficult to know what's the issue without examining the model and the command you ran. First, try to run the optimization process with optimization level of 0 (you can disable the GPU by adding CUDA_VISIBLE_DEVICES=999 before the command). In either case, if you can please open a ticket in out ticketing system in the Hailo website with the relevant info + files (the ONNX you used, for example), or contact me via email at @.*** with the relevant info.

Regards,

— Reply to this email directly, view it on GitHub https://github.com/hailo-ai/hailo_model_zoo/issues/111#issuecomment-2235547932, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADBIQYD2QBA5TJVT65SGFCDZM5IT7AVCNFSM6AAAAABLBYUFTWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMZVGU2DOOJTGI . You are receiving this because you are subscribed to this thread.Message ID: @.***>

Will-UEA commented 3 months ago

Hi @Will-UEA, It's difficult to know what's the issue without examining the model and the command you ran. First, try to run the optimization process with optimization level of 0 (you can disable the GPU by adding CUDA_VISIBLE_DEVICES=999 before the command). In either case, if you can please open a ticket in out ticketing system in the Hailo website with the relevant info + files (the ONNX you used, for example), or contact me via email at omerw@hailo.ai with the relevant info.

Regards,

I'll try to do that as soon as I get back from work. Once I do it, I'll come back here

Will-UEA commented 3 months ago

Please consult on the community forum בתאריך יום ה׳, 18 ביולי 2024, 08:42, מאת omerwer @. : Hi @Will-UEA https://github.com/Will-UEA, It's difficult to know what's the issue without examining the model and the command you ran. First, try to run the optimization process with optimization level of 0 (you can disable the GPU by adding CUDA_VISIBLE_DEVICES=999 before the command). In either case, if you can please open a ticket in out ticketing system in the Hailo website with the relevant info + files (the ONNX you used, for example), or contact me via email at @. with the relevant info. Regards, — Reply to this email directly, view it on GitHub <#111 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADBIQYD2QBA5TJVT65SGFCDZM5IT7AVCNFSM6AAAAABLBYUFTWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMZVGU2DOOJTGI . You are receiving this because you are subscribed to this thread.Message ID: @.***>

Should I start a thread there?

Will-UEA commented 3 months ago

The CLI command I was using was: hailomz compile yolov8s --ckpt yolov8s.onnx --hw_arch hailo8l --calib-path /home/hailo_model_zoo/Retreino/images --classes 3 --perfomance This command was giving me an error in the section I mentioned earlier. I tried using the Python code available from DFC and managed to get past the optimization (didn't encounter the same error). However, when compiling with the Python code, I got an error (forgot to print) and will try again when I get home.

Armtronix2021 commented 3 months ago

Since the past two or three days I have been trying to compile one custom model to hef. I am facing similar issue I have followed the procedure as mentioned in the links below (https://github.com/hailo-ai/hailo_model_zoo/tree/833ae6175c06dbd6c3fc8faeb23659c9efaa2dbe/training/yolov8) (https://github.com/hailo-ai/hailo-rpi5-examples/blob/main/doc/retraining-example.md#using-yolov8-retraining-docker)

I have used docker to do the training on my dataset. Sharing the commands below which i have run on the docker For Training : yolo detect train data=/home/abc/Image_Processing_Code/Image_Processing_MF_Form_Hailo/MF-object-detection-4/data.yaml model=yolov8s.pt name=MF_yolov8s_n epochs=300 batch=16 For Export to ONNX: yolo export model=/workspace/ultralytics/runs/detect/MF_yolov8s_n/weights/best.pt imgsz=640 format=onnx opset=11 Copying the Model to the regular system from the docker :

cp runs/detect/MF_yolov8s_n/weights/best.onnx /home/abc/Image_Processing_Code/Image_Processing_MF_Form_Hailo/yolov8s.onnx

plz note i have renamed it as yolov8s.onnx as i read this somewhere on github ,link to this (https://github.com/hailo-ai/hailo_model_zoo/issues/85) (https://github.com/hailo-ai/hailo_model_zoo/issues/94)

I exited the docker and then entered the following command "hailomz compile yolov8s --ckpt /home/abc/Image_Processing_Code/Image_Processing_MF_Form_Hailo/yolov8s.onnx --calib-path /home/abc/Image_Processing_Code/Image_Processing_MF_Form_Hailo/MF-object-detection-4/test/images --hw-arch hailo8l --classes 2 --performance"

Once i do this i get the following error

hailo_model_optimization.acceleras.utils.acceleras_exceptions.NegativeSlopeExponentNonFixable: Quantization failed in layer yolov8s/conv42 due to unsupported required slope. Desired shift is 14.0, but op has only 8 data bits. This error raises when the data or weight range are not balanced. Mostly happens when using random calibration-set/weights, the calibration-set is not normalized properly or batch-normalization was not used during training.

I have tried using the model directly in pt format on my system (not on pi5); it works without any issue . I am just a beginner so I am not sure what I am doing incorrectly. Any one who can point me in the correct direction would be of gr8 help attaching copy of error Complete error.txt

nadaved1 commented 3 months ago

There's a solution in th forum that might help https://community.hailo.ai/t/problem-with-model-optimization/1648/31?u=nadav

בתאריך שבת, 20 ביולי 2024, 08:56, מאת Armtronix2021 ‏< @.***>:

Since the past two or three days I have been trying to compile one custom model to hef. I am facing similar issue I have followed the procedure as mentioned in the links below ( https://github.com/hailo-ai/hailo_model_zoo/tree/833ae6175c06dbd6c3fc8faeb23659c9efaa2dbe/training/yolov8 ) ( https://github.com/hailo-ai/hailo-rpi5-examples/blob/main/doc/retraining-example.md#using-yolov8-retraining-docker )

I have used docker to do the training on my dataset. Sharing the commands below which i have run on the docker For Training : yolo detect train data=/home/abc/Image_Processing_Code/Image_Processing_MF_Form_Hailo/MF-object-detection-4/data.yaml model=yolov8s.pt name=MF_yolov8s_n epochs=300 batch=16 For Export to ONNX: yolo export model=/workspace/ultralytics/runs/detect/MF_yolov8s_n/weights/ best.pt imgsz=640 format=onnx opset=11 Copying the Model to the regular system from the docker :

cp runs/detect/MF_yolov8s_n/weights/best.onnx /home/abc/Image_Processing_Code/Image_Processing_MF_Form_Hailo/yolov8s.onnx

plz note i have renamed it as yolov8s.onnx as i read this somewhere on github ,link to this (#85 https://github.com/hailo-ai/hailo_model_zoo/issues/85) (#94 https://github.com/hailo-ai/hailo_model_zoo/issues/94)

I exited the docker and then entered the following command "hailomz compile yolov8s --ckpt /home/abc/Image_Processing_Code/Image_Processing_MF_Form_Hailo/yolov8s.onnx --calib-path /home/abc/Image_Processing_Code/Image_Processing_MF_Form_Hailo/MF-object-detection-4/test/images --hw-arch hailo8l --classes 2 --performance"

Once i do this i get the following error

hailo_model_optimization.acceleras.utils.acceleras_exceptions.NegativeSlopeExponentNonFixable: Quantization failed in layer yolov8s/conv42 due to unsupported required slope. Desired shift is 14.0, but op has only 8 data bits. This error raises when the data or weight range are not balanced. Mostly happens when using random calibration-set/weights, the calibration-set is not normalized properly or batch-normalization was not used during training.

I have tried using the model directly in pt format on my system (not on pi5); it works without any issue . I am just a beginner so I am not sure what I am doing incorrectly. Any one who can point me in the correct direction would be of gr8 help attaching copy of error Complete error.txt https://github.com/user-attachments/files/16318989/Complete.error.txt

— Reply to this email directly, view it on GitHub https://github.com/hailo-ai/hailo_model_zoo/issues/111#issuecomment-2240940812, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADBIQYC26ENCTUNJ6LONMULZNH3YRAVCNFSM6AAAAABLBYUFTWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENBQHE2DAOBRGI . You are receiving this because you commented.Message ID: @.***>

Will-UEA commented 3 months ago

Good morning, nadaved

I did the optimization process by adding that parameter you mentioned. Here are the results I got:

yoloteste/output_layer2 SNR: 4.574 db

yoloteste/output_layer1 SNR: -37.9 db

Is it correct?