Open tcpipchip opened 1 week ago
👋 Hello @tcpipchip, thank you for your interest in YOLOv5 🚀! It looks like you're working with the MILK-V 256, a RISC-V processor, and encountering a segmentation fault when running your exported model. No worries, we're here to help! 😊
For 🐛 Bug Reports like this, a minimum reproducible example is crucial, and you've done a great job in providing detailed steps and descriptions! This helps us understand the issue you're facing better. An Ultralytics engineer will review your report and assist you soon.
In the meantime, please verify you've set up your environment correctly:
Ensure you have Python>=3.8.0 installed with all the relevant libraries from the requirements.txt
and importantly, make sure you are using PyTorch>=1.8. To ensure everything is set up correctly, you might want to recreate your environment from scratch:
YOLOv5 runs smoothly in various environments like notebooks (Google Colab, Kaggle, etc.), cloud environments (Google Cloud, Amazon Web Services), or using Docker images with all dependencies pre-installed. Ensure your environment is up-to-date and configured correctly, including CUDA, cuDNN, Python, and PyTorch installations, particularly if you are leveraging GPU resources.
yolov5n.pt
as closely as possible.Stay tuned, and thank you for providing a comprehensive report! 📝 If there's anything else you can share about the exact error message or log outputs, feel free to add that information here. Our team is eager to assist you further! 🚀
yea, requirements ok! Python and pytorch. Please, can you train my images and send your .pt version to me ?
@tcpipchip i'm sorry, but we can't provide private training services. However, you can follow our Train Custom Data guide to train your model. If you encounter issues, feel free to ask for help here.
but have some tip about my problem ?
It seems like the issue might be related to the conversion process of your custom model. Ensure your model's architecture matches the pre-trained model you successfully converted, and double-check the conversion steps for any discrepancies.
i am investiganting now if is the image size...and testing with other pre-trainned pt of thirdy party
Testing with different image sizes and pre-trained models is a good approach. Ensure that the input dimensions match those expected by the model, and verify compatibility with the latest YOLOv5 version. If issues persist, consider checking the model's architecture and conversion process for inconsistencies.
Got it works, after 100 hours tryng https://milk-v.blogspot.com/2024/10/milk-v-yolov5-criando-dataset.html
@tcpipchip glad to hear you resolved your issue with YOLOv5 on the MILK-V TPU! For others who might encounter similar challenges with custom model deployment on TPU devices, I recommend checking our model export guide to understand the correct conversion steps and requirements for various hardware targets.
Thanks. will add your link on the blog.
milk-v uses a export py to onnx, looks that is the same code of your company
Thank you for sharing your blog post! While we appreciate the mention, please note that YOLOv5's ONNX export functionality is open-source under the AGPL-3.0 license, as documented in our model export guide. We're glad you found the TPU deployment process helpful.
Search before asking
YOLOv5 Component
No response
Bug
Hi Sir, Recently i got the MILK-V 256, a risc-v processor. I followed these instructions to recognize objects https://milkv.io/docs/duo/application-development/tpu/tpu-introduction https://milkv.io/docs/duo/application-development/tpu/tpu-docker https://milkv.io/docs/duo/application-development/tpu/tpu-yolov5 best.zip
And works very very very nice, using the YOLOV5 with the trainned https://github.com/ultralytics/yolov5/releases/download/v6.2/yolov5n.pt But, when i create my pt on Colab, best.pt, and convert it to execute on MILK, i always get SEGMENT FAULT train_data.zip attached my train on COLAB. On COLAB works, i can do the inference. Attached too the best.pt
Environment
Yolo5, docker, all requirements ok to yolov5 master
Minimal Reproducible Example
SEGMENT FAULT
looks that my problem is on my best.pt, because the yolov5n.pt pre trainned works nice!
Additional
Sequence using the yolov5n.pt
all works fine
For more help on how to use Docker, head to https://docs.docker.com/go/guides/ ubuntu@DESKTOP-UHGFA4M:~$ docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES ubuntu@DESKTOP-UHGFA4M:~$ docker run --privileged --name duotpu -v /workspace -it sophgo/tpuc_dev:v3.1 docker: Error response from daemon: Conflict. The container name "/duotpu" is already in use by container "2a46fc75400fa362ed00811b4ec34bba2612506d3938b0e72f8fabab41350246". You have to remove (or rename) that container to be able to reuse that name. See 'docker run --help'. ubuntu@DESKTOP-UHGFA4M:~$ docker run --privileged --name duotpu -v /workspace -it sophgo/tpuc_dev:v3.1 docker: Error response from daemon: Conflict. The container name "/duotpu" is already in use by container "2a46fc75400fa362ed00811b4ec34bba2612506d3938b0e72f8fabab41350246". You have to remove (or rename) that container to be able to reuse that name. See 'docker run --help'. ubuntu@DESKTOP-UHGFA4M:~$ docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 2a46fc75400f sophgo/tpuc_dev:v3.1 "/bin/bash" 2 days ago Up 12 seconds duotpu ubuntu@DESKTOP-UHGFA4M:~$ docker exec -it 2a46fc75400f /bin/bash root@2a46fc75400f:/workspace# pytorch bash: pytorch: command not found root@2a46fc75400f:/workspace# ls best.pt master tpu-mlir tpu-sdk yolov5-master yolov5n_torch root@2a46fc75400f:/workspace# cd yolov5n_torch/ root@2a46fc75400f:/workspace/yolov5n_torch# ls _weight_map.csv yolov5n_cv181x_int8_sym_final.mlir yolov5n_jit.pt best.pt yolov5n_cv181x_int8_sym_model_outputs.npz yolov5n_origin.mlir cat.jpg yolov5n_cv181x_int8_sym_tpu.mlir yolov5n_top_f32_all_origin_weight.npz train_data yolov5n_cv181x_int8_sym_tpu_outputs.npz yolov5n_top_f32_all_weight.npz train_data.zip yolov5n_in_f32.npz yolov5n_top_outputs.npz work yolov5n_in_ori.npz yolov5n_tpu_addressed_cv181x_int8_sym_weight.npz yolov5n.mlir yolov5n_int8_fuse.cvimodel yolov5n_tpu_addressed_cv181x_int8_sym_weight_fix.npz yolov5n_cali_table yolov5n_int8_fuse_tensor_info.txt yolov5n_tpu_lowered_cv181x_int8_sym_weight.npz root@2a46fc75400f:/workspace/yolov5n_torch# ls r ls: cannot access 'r': No such file or directory root@2a46fc75400f:/workspace/yolov5n_torch# cd .. root@2a46fc75400f:/workspace# ls best.pt master tpu-mlir tpu-sdk yolov5-master yolov5n_torch root@2a46fc75400f:/workspace# cd yolov5-master/ root@2a46fc75400f:/workspace/yolov5-master# dir CITATION.cff README.zh-CN.md data main.py segment val.py CONTRIBUTING.md benchmarks.py detect.py models train.py yolov5n_jit.pt LICENSE best.pt export.py pyproject.toml tutorial.ipynb README.md classify hubconf.py requirements.txt utils root@2a46fc75400f:/workspace/yolov5-master# cat requirements.txt
YOLOv5 requirements
Usage: pip install -r requirements.txt
Base ------------------------------------------------------------------------
gitpython>=3.1.30 matplotlib>=3.3 numpy>=1.23.5 opencv-python>=4.1.1 pillow>=10.3.0 psutil # system resources PyYAML>=5.3.1 requests>=2.32.2 scipy>=1.4.1 thop>=0.1.1 # FLOPs computation torch>=1.8.0 # see https://pytorch.org/get-started/locally (recommended) torchvision>=0.9.0 tqdm>=4.66.3 ultralytics>=8.2.34 # https://ultralytics.com
protobuf<=3.20.1 # https://github.com/ultralytics/yolov5/issues/8012
Logging ---------------------------------------------------------------------
tensorboard>=2.4.1
clearml>=1.2.0
comet
Plotting --------------------------------------------------------------------
pandas>=1.1.4 seaborn>=0.11.0
Export ----------------------------------------------------------------------
coremltools>=6.0 # CoreML export
onnx>=1.10.0 # ONNX export
onnx-simplifier>=0.4.1 # ONNX simplifier
nvidia-pyindex # TensorRT export
nvidia-tensorrt # TensorRT export
scikit-learn<=1.1.2 # CoreML quantization
tensorflow>=2.4.0,<=2.13.1 # TF exports (-cpu, -aarch64, -macos)
tensorflowjs>=3.9.0 # TF.js export
openvino-dev>=2023.0 # OpenVINO export
Deploy ----------------------------------------------------------------------
setuptools>=70.0.0 # Snyk vulnerability fix
tritonclient[all]~=2.24.0
Extras ----------------------------------------------------------------------
ipython # interactive notebook
mss # screenshots
albumentations>=1.0.3
pycocotools>=2.0.6 # COCO mAP
root@2a46fc75400f:/workspace/yolov5-master# nano requirements.txt root@2a46fc75400f:/workspace/yolov5-master# pip install -r requirements.txt Requirement already satisfied: gitpython>=3.1.30 in /usr/local/lib/python3.10/dist-packages (from -r requirements.txt (line 5)) (3.1.32) Requirement already satisfied: matplotlib>=3.3 in /usr/local/lib/python3.10/dist-packages (from -r requirements.txt (line 6)) (3.7.2) Requirement already satisfied: numpy>=1.23.5 in /usr/local/lib/python3.10/dist-packages (from -r requirements.txt (line 7)) (1.24.3) Requirement already satisfied: opencv-python>=4.1.1 in /usr/local/lib/python3.10/dist-packages (from -r requirements.txt (line 8)) (4.8.0.74) Requirement already satisfied: pillow>=10.3.0 in /usr/local/lib/python3.10/dist-packages (from -r requirements.txt (line 9)) (11.0.0) Requirement already satisfied: psutil in /usr/local/lib/python3.10/dist-packages (from -r requirements.txt (line 10)) (5.9.5) Requirement already satisfied: PyYAML>=5.3.1 in /usr/lib/python3/dist-packages (from -r requirements.txt (line 11)) (5.4.1) Requirement already satisfied: requests>=2.32.2 in /usr/local/lib/python3.10/dist-packages (from -r requirements.txt (line 12)) (2.32.3) Requirement already satisfied: scipy>=1.4.1 in /usr/local/lib/python3.10/dist-packages (from -r requirements.txt (line 13)) (1.11.1) Requirement already satisfied: thop>=0.1.1 in /usr/local/lib/python3.10/dist-packages (from -r requirements.txt (line 14)) (0.1.1.post2209072238) Requirement already satisfied: torch>=1.8.0 in /usr/local/lib/python3.10/dist-packages (from -r requirements.txt (line 15)) (2.0.1+cpu) Requirement already satisfied: torchvision>=0.9.0 in /usr/local/lib/python3.10/dist-packages (from -r requirements.txt (line 16)) (0.15.2+cpu) Requirement already satisfied: tqdm>=4.66.3 in /usr/local/lib/python3.10/dist-packages (from -r requirements.txt (line 17)) (4.67.0) Requirement already satisfied: ultralytics>=8.2.34 in /usr/local/lib/python3.10/dist-packages (from -r requirements.txt (line 18)) (8.3.28) Requirement already satisfied: pandas>=1.1.4 in /usr/local/lib/python3.10/dist-packages (from -r requirements.txt (line 27)) (2.0.3) Requirement already satisfied: seaborn>=0.11.0 in /usr/local/lib/python3.10/dist-packages (from -r requirements.txt (line 28)) (0.13.2) Requirement already satisfied: setuptools>=70.0.0 in /usr/local/lib/python3.10/dist-packages (from -r requirements.txt (line 42)) (75.3.0) Requirement already satisfied: gitdb<5,>=4.0.1 in /usr/local/lib/python3.10/dist-packages (from gitpython>=3.1.30->-r requirements.txt (line 5)) (4.0.10) Requirement already satisfied: contourpy>=1.0.1 in /usr/local/lib/python3.10/dist-packages (from matplotlib>=3.3->-r requirements.txt (line 6)) (1.1.0) Requirement already satisfied: python-dateutil>=2.7 in /usr/local/lib/python3.10/dist-packages (from matplotlib>=3.3->-r requirements.txt (line 6)) (2.8.2) Requirement already satisfied: pyparsing<3.1,>=2.3.1 in /usr/local/lib/python3.10/dist-packages (from matplotlib>=3.3->-r requirements.txt (line 6)) (3.0.9) Requirement already satisfied: fonttools>=4.22.0 in /usr/local/lib/python3.10/dist-packages (from matplotlib>=3.3->-r requirements.txt (line 6)) (4.42.1) Requirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.10/dist-packages (from matplotlib>=3.3->-r requirements.txt (line 6)) (0.11.0) Requirement already satisfied: kiwisolver>=1.0.1 in /usr/local/lib/python3.10/dist-packages (from matplotlib>=3.3->-r requirements.txt (line 6)) (1.4.5) Requirement already satisfied: packaging>=20.0 in /usr/local/lib/python3.10/dist-packages (from matplotlib>=3.3->-r requirements.txt (line 6)) (23.1) Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.10/dist-packages (from requests>=2.32.2->-r requirements.txt (line 12)) (3.4) Requirement already satisfied: charset-normalizer<4,>=2 in /usr/local/lib/python3.10/dist-packages (from requests>=2.32.2->-r requirements.txt (line 12)) (3.2.0) Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.10/dist-packages (from requests>=2.32.2->-r requirements.txt (line 12)) (1.26.16) Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.10/dist-packages (from requests>=2.32.2->-r requirements.txt (line 12)) (2023.7.22) Requirement already satisfied: sympy in /usr/local/lib/python3.10/dist-packages (from torch>=1.8.0->-r requirements.txt (line 15)) (1.12) Requirement already satisfied: typing-extensions in /usr/local/lib/python3.10/dist-packages (from torch>=1.8.0->-r requirements.txt (line 15)) (4.5.0) Requirement already satisfied: jinja2 in /usr/local/lib/python3.10/dist-packages (from torch>=1.8.0->-r requirements.txt (line 15)) (3.1.2) Requirement already satisfied: filelock in /usr/lib/python3/dist-packages (from torch>=1.8.0->-r requirements.txt (line 15)) (3.6.0) Requirement already satisfied: networkx in /usr/local/lib/python3.10/dist-packages (from torch>=1.8.0->-r requirements.txt (line 15)) (3.1) Requirement already satisfied: py-cpuinfo in /usr/local/lib/python3.10/dist-packages (from ultralytics>=8.2.34->-r requirements.txt (line 18)) (9.0.0) Requirement already satisfied: ultralytics-thop>=2.0.0 in /usr/local/lib/python3.10/dist-packages (from ultralytics>=8.2.34->-r requirements.txt (line 18)) (2.0.11) Requirement already satisfied: pytz>=2020.1 in /usr/local/lib/python3.10/dist-packages (from pandas>=1.1.4->-r requirements.txt (line 27)) (2023.3) Requirement already satisfied: tzdata>=2022.1 in /usr/local/lib/python3.10/dist-packages (from pandas>=1.1.4->-r requirements.txt (line 27)) (2023.3) Requirement already satisfied: smmap<6,>=3.0.1 in /usr/local/lib/python3.10/dist-packages (from gitdb<5,>=4.0.1->gitpython>=3.1.30->-r requirements.txt (line 5)) (5.0.0) Requirement already satisfied: six>=1.5 in /usr/lib/python3/dist-packages (from python-dateutil>=2.7->matplotlib>=3.3->-r requirements.txt (line 6)) (1.16.0) Requirement already satisfied: MarkupSafe>=2.0 in /usr/local/lib/python3.10/dist-packages (from jinja2->torch>=1.8.0->-r requirements.txt (line 15)) (2.1.3) Requirement already satisfied: mpmath>=0.19 in /usr/local/lib/python3.10/dist-packages (from sympy->torch>=1.8.0->-r requirements.txt (line 15)) (1.3.0) WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv root@2a46fc75400f:/workspace/yolov5-master# ls CITATION.cff README.zh-CN.md data main.py segment val.py CONTRIBUTING.md benchmarks.py detect.py models train.py yolov5n_jit.pt LICENSE best.pt export.py pyproject.toml tutorial.ipynb README.md classify hubconf.py requirements.txt utils root@2a46fc75400f:/workspace/yolov5-master# nano main.py root@2a46fc75400f:/workspace/yolov5-master# root@2a46fc75400f:/workspace/yolov5-master# root@2a46fc75400f:/workspace/yolov5-master# root@2a46fc75400f:/workspace/yolov5-master# wget https://github.com/ultralytics/yolov5/releases/download/v6.2/yolov5n.pt --2024-11-11 19:18:11-- https://github.com/ultralytics/yolov5/releases/download/v6.2/yolov5n.pt Resolving github.com (github.com)... 20.201.28.151 Connecting to github.com (github.com)|20.201.28.151|:443... connected. HTTP request sent, awaiting response... 302 Found Location: https://objects.githubusercontent.com/github-production-release-asset-2e65be/264818686/3444cd1f-277c-414f-bdc9-3ac8ed6062df?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=releaseassetproduction%2F20241111%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20241111T111811Z&X-Amz-Expires=300&X-Amz-Signature=b7761184e059f5a596b94e432bf731d13dc16857dab233d44d18080fc0f23350&X-Amz-SignedHeaders=host&response-content-disposition=attachment%3B%20filename%3Dyolov5n.pt&response-content-type=application%2Foctet-stream [following] --2024-11-11 19:18:11-- https://objects.githubusercontent.com/github-production-release-asset-2e65be/264818686/3444cd1f-277c-414f-bdc9-3ac8ed6062df?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=releaseassetproduction%2F20241111%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20241111T111811Z&X-Amz-Expires=300&X-Amz-Signature=b7761184e059f5a596b94e432bf731d13dc16857dab233d44d18080fc0f23350&X-Amz-SignedHeaders=host&response-content-disposition=attachment%3B%20filename%3Dyolov5n.pt&response-content-type=application%2Foctet-stream Resolving objects.githubusercontent.com (objects.githubusercontent.com)... 185.199.109.133, 185.199.108.133, 185.199.111.133, ... Connecting to objects.githubusercontent.com (objects.githubusercontent.com)|185.199.109.133|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 4062133 (3.9M) [application/octet-stream] Saving to: ‘yolov5n.pt’
yolov5n.pt 100%[================================================>] 3.87M 8.31MB/s in 0.5s
2024-11-11 19:18:12 (8.31 MB/s) - ‘yolov5n.pt’ saved [4062133/4062133]
root@2a46fc75400f:/workspace/yolov5-master# cat main.py import torch from models.experimental import attempt_download model = torch.load(attempt_download("./yolov5n.pt"), map_location=torch.device('cpu'))['model'].float() model.eval() model.model[-1].export = True torch.jit.trace(model, torch.rand(1, 3, 640, 640), strict=False).save('./yolov5n_jit.pt') root@2a46fc75400f:/workspace/yolov5-master# python main.py root@2a46fc75400f:/workspace/yolov5-master# cp yolov5n_jit.pt /workspace/yolov5-master/^C root@2a46fc75400f:/workspace/yolov5-master# cd .. root@2a46fc75400f:/workspace# cd yolov5n_torch root@2a46fc75400f:/workspace/yolov5n_torch# cp /workspace/yolov5-master/yolov5n_jit.pt . root@2a46fc75400f:/workspace/yolov5n_torch# source ./tpu-mlir/envsetup.sh bash: ./tpu-mlir/envsetup.sh: No such file or directory root@2a46fc75400f:/workspace/yolov5n_torch# cd .. root@2a46fc75400f:/workspace# source ./tpu-mlir/envsetup.sh root@2a46fc75400f:/workspace# cd yolov5n_torch/ root@2a46fc75400f:/workspace/yolov5n_torch# cp -rf ${TPUC_ROOT}/regression/dataset/COCO2017 . root@2a46fc75400f:/workspace/yolov5n_torch# cp -rf ${TPUC_ROOT}/regression/image . root@2a46fc75400f:/workspace/yolov5n_torch# model_transform.py \
Traceback (most recent call last): File "/workspace/tpu-mlir/python/tools/model_transform.py", line 272, in
tool = get_model_transform(args)
File "/workspace/tpu-mlir/python/tools/model_transform.py", line 232, in get_model_transform
tool = TorchTransformer(args.model_name, args.model_def, args.input_shapes,
File "/workspace/tpu-mlir/python/tools/model_transform.py", line 204, in init
self.converter = TorchConverter(self.model_name, self.model_def, input_shapes, input_types,
File "/workspace/tpu-mlir/python/transform/TorchConverter.py", line 55, in init
self.load_torch_model(torch_file, input_shapes, input_types, output_names)
File "/workspace/tpu-mlir/python/transform/TorchConverter.py", line 251, in load_torch_model
self.model = torch.jit.load(torch_file, map_location=torch.device('cpu'))
File "/usr/local/lib/python3.10/dist-packages/torch/jit/_serialization.py", line 152, in load
raise ValueError("The provided filename {} does not exist".format(f)) # type: ignore[str-bytes-safe]
ValueError: The provided filename ../yolov5n_jit.pt does not exist
root@2a46fc75400f:/workspace/yolov5n_torch# model_transform.py \
Save mlir file: yolov5n_origin.mlir [Running]: tpuc-opt yolov5n_origin.mlir --shape-infer --canonicalize --extra-optimize -o yolov5n.mlir [Success]: tpuc-opt yolov5n_origin.mlir --shape-infer --canonicalize --extra-optimize -o yolov5n.mlir Mlir file generated:yolov5n.mlir 2024/11/11 19:23:10 - INFO : load_config Preprocess args : resize_dims : [640, 640] keep_aspect_ratio : True keep_ratio_mode : letterbox pad_value : 0 pad_type : center input_dims : [640, 640]
[CMD]: model_runner.py --input yolov5n_in_f32.npz --model ./yolov5n_jit.pt --output yolov5n_ref_outputs.npz 80: 100%|████████████████████████████████████████████████████████████████████████| 1230/1230 [00:01<00:00, 1134.76it/s] Saving yolov5n_ref_outputs.npz [CMD]: model_runner.py --input yolov5n_in_f32.npz --model yolov5n.mlir --output yolov5n_top_outputs.npz [##################################################] 100% Saving yolov5n_top_outputs.npz [Running]: npz_tool.py compare yolov5n_top_outputs.npz yolov5n_ref_outputs.npz --tolerance 0.99,0.99 --except - -vv compare 1249: 100%|█████████████████████████████████████████████████████████████████▋| 199/200 [00:06<00:00, 37.17it/s][x.1 ] EQUAL [PASSED] (1, 3, 640, 640) float32 [input.62 ] SIMILAR [PASSED] (1, 16, 320, 320) float32 cosine_similarity = 1.000000 euclidean_similarity = 0.999999 sqnr_similarity = 123.899231 [input.26 ] SIMILAR [PASSED] (1, 16, 320, 320) float32 cosine_similarity = 1.000000 euclidean_similarity = 1.000000 sqnr_similarity = 127.972746 [103 ] SIMILAR [PASSED] (1, 16, 320, 320) float32 cosine_similarity = 1.000000 euclidean_similarity = 1.000000 sqnr_similarity = 122.331476 [input.60 ] SIMILAR [PASSED] ... (1, 255, 80, 80) float32 cosine_similarity = 1.000000 euclidean_similarity = 1.000000 sqnr_similarity = 119.547615 [1234 ] SIMILAR [PASSED] (1, 255, 40, 40) float32 cosine_similarity = 1.000000 euclidean_similarity = 1.000000 sqnr_similarity = 118.485079 [1249 ] SIMILAR [PASSED] (1, 255, 20, 20) float32 cosine_similarity = 1.000000 euclidean_similarity = 1.000000 sqnr_similarity = 118.825769 200 compared 200 passed 1 equal, 3 close, 196 similar 0 failed 0 not equal, 0 not similar min_similiarity = (0.9999997615814209, 0.999998192529101, 114.64153289794922) Target yolov5n_top_outputs.npz Reference yolov5n_ref_outputs.npz npz compare PASSED. compare 1249: 100%|██████████████████████████████████████████████████████████████████| 200/200 [00:08<00:00, 24.98it/s] [Success]: npz_tool.py compare yolov5n_top_outputs.npz yolov5n_ref_outputs.npz --tolerance 0.99,0.99 --except - -vv root@2a46fc75400f:/workspace/yolov5n_torch# run_calibration.py yolov5n.mlir \
last input data (idx=100) not valid, droped input_num = 100, ref = 100 real input_num = 100 activation_collect_and_calc_th for op: 1249: 100%|███████████████████████████████████| 200/200 [04:25<00:00, 1.33s/it] [2048] threshold: 1249: 100%|███████████████████████████████████████████████████████| 200/200 [00:00<00:00, 235.10it/s] GmemAllocator use OpSizeOrderAssign reused mem is 3276800, all mem is 43767600 GmemAllocator use OpSizeOrderAssign reused mem is 3276800, all mem is 43767600 prepare data from 100 tune op: 1249: 100%|█████████████████████████████████████████████████████████████████| 200/200 [07:13<00:00, 2.17s/it] auto tune end, run time:433.61561346054077 root@2a46fc75400f:/workspace/yolov5n_torch# model_deploy.py \ \ --qu> --mlir yolov5n.mlir \
Add preprocess, set the following params: 2024/11/11 19:37:39 - INFO :
config Preprocess args : resize_dims : [640, 640] keep_aspect_ratio : True keep_ratio_mode : letterbox pad_value : 0 pad_type : center
[Running]: tpuc-opt yolov5n.mlir --chip-assign="chip=cv181x" --import-calibration-table="file=./yolov5n_cali_table asymmetric=False" --chip-top-optimize --fuse-preprocess="mode=INT8 customization_format=RGB_PLANAR align=False" --convert-top-to-tpu="mode=INT8 asymmetric=False linear_quant_mode=NORMAL doWinograd=False ignore_f16_overflow=False" --canonicalize -o yolov5n_cv181x_int8_sym_tpu.mlir Entering FusePreprocessPass. Inserting ScalelutOp. [Success]: tpuc-opt yolov5n.mlir --chip-assign="chip=cv181x" --import-calibration-table="file=./yolov5n_cali_table asymmetric=False" --chip-top-optimize --fuse-preprocess="mode=INT8 customization_format=RGB_PLANAR align=False" --convert-top-to-tpu="mode=INT8 asymmetric=False linear_quant_mode=NORMAL doWinograd=False ignore_f16_overflow=False" --canonicalize -o yolov5n_cv181x_int8_sym_tpu.mlir [CMD]: model_runner.py --input yolov5n_in_ori.npz --model yolov5n_cv181x_int8_sym_tpu.mlir --output yolov5n_cv181x_int8_sym_tpu_outputs.npz [##################################################] 100% [Running]: npz_tool.py compare yolov5n_cv181x_int8_sym_tpu_outputs.npz yolov5n_top_outputs.npz --tolerance 0.96,0.72 --except - -vv compare 1249: 99%|█████████████████████████████████████████████████████████████████▌| 141/142 [00:05<00:00, 21.14it/s][input.26 ] SIMILAR [PASSED] (1, 16, 320, 320) float32 cosine_similarity = 0.999769 euclidean_similarity = 0.978254 sqnr_similarity = 32.948797 [103 ] SIMILAR [PASSED] (1, 16, 320, 320) float32 cosine_similarity = 0.999255 euclidean_similarity = 0.961272 sqnr_similarity = 24.556572 ... (1, 255, 40, 40) float32 cosine_similarity = 0.999221 euclidean_similarity = 0.959803 sqnr_similarity = 18.724862 [1249 ] SIMILAR [PASSED] (1, 255, 20, 20) float32 cosine_similarity = 0.999214 euclidean_similarity = 0.960290 sqnr_similarity = 18.388116 142 compared 142 passed 0 equal, 0 close, 142 similar 0 failed 0 not equal, 0 not similar min_similiarity = (0.9679524302482605, 0.7443984113616068, 11.602303981781006) Target yolov5n_cv181x_int8_sym_tpu_outputs.npz Reference yolov5n_top_outputs.npz npz compare PASSED. compare 1249: 100%|██████████████████████████████████████████████████████████████████| 142/142 [00:06<00:00, 22.79it/s] [Success]: npz_tool.py compare yolov5n_cv181x_int8_sym_tpu_outputs.npz yolov5n_top_outputs.npz --tolerance 0.96,0.72 --except - -vv [Running]: tpuc-opt yolov5n_cv181x_int8_sym_tpu.mlir --mlir-disable-threading --strip-io-quant="quant_input=False quant_output=False" --chip-tpu-optimize --distribute='num_device=1' --weight-reorder --subnet-divide="dynamic=False" --op-reorder --layer-group="opt=2" --parallel='num_core=1' --address-assign -o yolov5n_cv181x_int8_sym_final.mlir ==---------------------------== Run LayerGroupSearchPass : Searching the optimal layer groups ==---------------------------==
======================================================= * Dynamic Programming layer group with cluster
total num of base_group is 7 clusters idx(size): 0(1), 1(2), 3(2), 5(2), 7(2), 9(2), 11(2), 13(1), 14(1), 15(2), 17(2), 19(2), 21(2), 23(2), 25(2), 27(2), 29(2), 31(2), 33(1), 34(2), 36(2), 38(2), 40(2), 42(2), 44(2), 46(2), 48(2), 50(2), 52(2), 54(2), 56(1), 57(1), 58(2), 60(1), 61(2), 63(2), 65(2), 67(2), 69(2), 71(2), 73(2), 75(2), 77(2), 79(2), 81(2), 83(2), 85(2), 87(2), 89(2), 91(2), 93(1), 94(1), 95(2), 97(2), 99(2), 101(2), 103(2), 105(2), 107(2), 109(1), 110(2), 112(2), 114(2), 116(2), 118(2), 120(2), 122(1), 123(1), 124(2), 126(2), 128(2), 130(2), 132(2), 134(2), 136(1), 137(2), 139(2), process base group 0, layer_num=141, cluster_num=77 Searching best group slices... [#################################################] 100% clusters idx(size): 0(1), process base group 1, layer_num=1, cluster_num=1 clusters idx(size): 0(1), process base group 2, layer_num=1, cluster_num=1 clusters idx(size): 0(1), process base group 3, layer_num=1, cluster_num=1 clusters idx(size): 0(1), process base group 4, layer_num=1, cluster_num=1 clusters idx(size): 0(1), process base group 5, layer_num=1, cluster_num=1 clusters idx(size): 0(1), process base group 6, layer_num=1, cluster_num=1
Consider redundant computation and gdma cost
The final cost of the two group is 1182594 //// Group cost 1182594, optimal cut idx 139 The final cost of the two group is 1116710 //// Group cost 1116710, optimal cut idx 138 The final cost of the two group is 1315164 The final cost of the two group is 970894 //// Group cost 970894, optimal cut idx 137 The final cost of the two group is 866493 //// Group cost 866493, optimal cut idx 136 The final cost of the two group is 877481 The final cost of the two group is 941308 The final cost of the two group is 892746 The pre cost of the two group is 898167 The final cost of the two group is 901710 //// Group cost 901710, optimal cut idx 132 The final cost of the two group is 832079 .... The final cost of the two group is 4092392 //// Group cost 4092392, optimal cut idx 0
Merge cut idx to reduce gdma cost
==---------------------------== Run GroupPostTransformPass : Some transform after layer groups is determined ==---------------------------== ==---------------------------== Run TimeStepAssignmentPass : Assign timestep task for each group. ==---------------------------== ==---------------------------== Run LocalMemoryAllocationPass : Allocate local memory for all layer groups ==---------------------------== ==---------------------------== Run TimeStepCombinePass : Combine time step for better parallel balance ==---------------------------== ==---------------------------== Run GroupDataMoveOverlapPass : Overlap data move between two layer group ==---------------------------== GmemAllocator use OpSizeOrderAssign [Success]: tpuc-opt yolov5n_cv181x_int8_sym_tpu.mlir --mlir-disable-threading --strip-io-quant="quant_input=False quant_output=False" --chip-tpu-optimize --distribute='num_device=1' --weight-reorder --subnet-divide="dynamic=False" --op-reorder --layer-group="opt=2" --parallel='num_core=1' --address-assign -o yolov5n_cv181x_int8_sym_final.mlir [Running]: tpuc-opt yolov5n_cv181x_int8_sym_final.mlir --codegen="model_file=yolov5n_int8_fuse.cvimodel embed_debug_info=true model_version=latest" -o /dev/null [oc_pos=32] cur_oc 8, stepSize 1024, compressedSize 1040, SKIP [Success]: tpuc-opt yolov5n_cv181x_int8_sym_final.mlir --codegen="model_file=yolov5n_int8_fuse.cvimodel embed_debug_info=true model_version=latest" -o /dev/null [CMD]: model_runner.py --input yolov5n_in_ori.npz --model yolov5n_int8_fuse.cvimodel --output yolov5n_cv181x_int8_sym_model_outputs.npz setenv:cv181x Start TPU Simulator for cv181x device[0] opened, 4294967296 version: 1.4.0 yolov5n Build at 2024-11-11 19:37:51 For platform cv181x Cmodel: bm_load_cmdbuf Max SharedMem size:2457600 Cmodel: bm_run_cmdbuf device[0] closed [Running]: npz_tool.py compare yolov5n_cv181x_int8_sym_model_outputs.npz yolov5n_cv181x_int8_sym_tpu_outputs.npz --tolerance 0.99,0.90 --except - -vv compare 1249_f32: 88%|█████████████████████████████████████████████████████████▊ | 7/8 [00:00<00:00, 69.16it/s][964 ] EQUAL [PASSED] (1, 64, 80, 80) float32 [1081 ] EQUAL [PASSED] (1, 128, 40, 40) float32 [input.1 ] EQUAL [PASSED] (1, 256, 20, 20) float32 [1198 ] EQUAL [PASSED] (1, 256, 20, 20) float32 [1219_f32 ] EQUAL [PASSED] (1, 255, 80, 80) float32 [1234_f32 ] EQUAL [PASSED] (1, 255, 40, 40) float32 [1249 ] EQUAL [PASSED] (1, 255, 20, 20) float32 [1249_f32 ] EQUAL [PASSED] (1, 255, 20, 20) float32 8 compared 8 passed 8 equal, 0 close, 0 similar 0 failed 0 not equal, 0 not similar min_similiarity = (1.0, 1.0, inf) Target yolov5n_cv181x_int8_sym_model_outputs.npz Reference yolov5n_cv181x_int8_sym_tpu_outputs.npz npz compare PASSED. compare 1249_f32: 100%|██████████████████████████████████████████████████████████████████| 8/8 [00:00<00:00, 27.38it/s] [Success]: npz_tool.py compare yolov5n_cv181x_int8_sym_model_outputs.npz yolov5n_cv181x_int8_sym_tpu_outputs.npz --tolerance 0.99,0.90 --except - -vv root@2a46fc75400f:/workspace/yolov5n_torch# scp -r /workspace/tpu-sdk root@192.168.42.1:/mnt/tpu/ root@192.168.42.1's password: OpenCVModules-release.cmake 100% 2053 402.1KB/s 00:00 haarcascade_eye.xml 100% 333KB 2.7MB/s 00:00 haarcascade_smile.xml 100% 184KB 2.7MB/s 00:00 .... libcvimath-static.a 100% 172KB 2.6MB/s 00:00 libcviruntime.so 100% 574KB 2.9MB/s 00:00 root@2a46fc75400f:/workspace/yolov5n_torch# scp /workspace/yolov5n_torch/yolov5n_int8_fuse.cvimodel root@192.168.42.1:/ mnt/tpu/tpu-sdk/ root@192.168.42.1's password: yolov5n_int8_fuse.cvimodel 100% 2158KB 2.9MB/s 00:00 root@2a46fc75400f:/workspace/yolov5n_torch# ls -l total 389176 drwxr-xr-x 2 root root 4096 Nov 11 19:21 COCO2017 -rw-r--r-- 1 root root 12398 Nov 11 19:37 _weight_map.csv -rwxr-xr-x 1 root root 14447400 Nov 9 09:00 best.pt -rwxr-xr-x 1 root root 40717 Oct 29 07:42 cat.jpg drwxr-xr-x 2 root root 4096 Nov 11 19:21 image drwxr-xr-x 5 root root 4096 Nov 7 14:12 train_data -rwxr-xr-x 1 root root 2524205 Nov 8 01:36 train_data.zip drwxr-xr-x 2 root root 4096 Nov 9 10:02 work -rw-r--r-- 1 root root 64711 Nov 11 19:23 yolov5n.mlir -rw-r--r-- 1 root root 8011 Nov 11 19:35 yolov5n_cali_table -rw-r--r-- 1 root root 2210112 Nov 11 19:37 yolov5n_int8_fuse.cvimodel
root@2a46fc75400f:/workspace/yolov5n_torch#
now using the best.pt
model_deploy.py \ --mlir yolov5n.mlir \ --quantize INT8 \ --calibration_table ./yolov5n_cali_table \ --chip cv181x \ --test_input ./cat.jpg \ --test_reference yolov5n_top_outputs.npz \ --compare_all \ --fuse_preprocess \ --debug \ --model yolov5n_int8_fuse.cvimodel
Thank you!
Are you willing to submit a PR?