visual-layer / fastdup

fastdup is a powerful, free tool designed to rapidly generate valuable insights from image and video datasets. It helps enhance the quality of both images and labels, while significantly reducing data operation costs, all with unmatched scalability.
Other
1.6k stars 77 forks source link

Canceled future for execute_request + *** buffer overflow detected *** message while executing in Jupyter Notebook/Python script #94

Closed mrdbourke closed 1 year ago

mrdbourke commented 1 year ago

Hi there,

Thank you for the incredible library!

I've used it previously over one of my datasets and it worked really well.

Now I'm trying to use it again (a similar dataset but quite a few more images - 117k+).

However, I keep running into the following error:

Screenshot 2023-03-07 at 7 18 50 am

I've tried lowering the number of threads with the following code:

import fastdup
print(fastdup.__version__)

>>> 0.213
fd = fastdup.create(work_dir=output_dir,
                    input_dir=images_dir)
fd.run(num_threads=4,
       threshold=0.96,
       compute="cpu",
       verbose=True)

Output:

FastDup Software, (C) copyright 2022 Dr. Amir Alush and Dr. Danny Bickson.
2023-03-07 07:15:55 [INFO] Version 0.213 Release compiled on Mar  5 2023 20:05:42
2023-03-07 07:15:55 [DEBUG] Read model file UndisclosedFastdupModel.ort input layer name input output layer name global_average_pooling2d_12
2023-03-07 07:15:55 [DEBUG] out_dims[0] = -1
2023-03-07 07:15:55 [DEBUG] out_dims[1] = 576
2023-03-07 07:15:55 [DEBUG] Model dimensions are -1 224 224 3 is_nchw? 0 is b/w? 0
2023-03-07 07:15:55 [INFO] Going to loop over dir artifacts/food_vision_199_classes_images:v15
2023-03-07 07:15:56 [DEBUG] find -L artifacts/food_vision_199_classes_images:v15 -type f | egrep -i '\.bmp$|\.jpg$|\.tiff$|\.giff$|\.jpeg$|\.png$|\.tif$|\.tar$|\.tar.gz$|\.zip$|\.tgz$|\.mp4$|\.avi$|\.heif$|\.heic$'| sort > duplicates/tmp/files0.txt
2023-03-07 07:15:56 [DEBUG] Read a total of 117574 lines from duplicates/tmp/files0.txt
2023-03-07 07:15:56 [DEBUG] Total images read so far 117574
2023-03-07 07:15:56 [INFO] Found total 117574 images to run on
2023-03-07 07:15:56 [DEBUG] Going to init pool
2023-03-07 07:15:56 [DEBUG] Starting to run with 4 threads
2023-03-07 07:15:56 [DEBUG] Going to init quad array of size 117574
2023-03-07 07:15:56 [DEBUG] Going to init jobs
2023-03-07 07:15:56 [DEBUG] Run inference artifacts/food_vision_199_classes_images:v15/000226a7-5332-4f45-b0e9-6760e9bd6d3e.jpeg 0 batch size 1
2023-03-07 07:15:56 [DEBUG] Run inference artifacts/food_vision_199_classes_images:v15/0003c8a1-7f64-4540-9256-3252f0981035.jpeg 1 batch size 1
2023-03-07 07:15:56 [DEBUG] Going to run inference batch 0 0 1 start off 0 orig off 0 len 1
2023-03-07 07:15:56 [DEBUG] Going to run inference batch 0 0 1 start off 1 orig off 1 len 1
2023-03-07 07:15:56 [DEBUG] Run inference artifacts/food_vision_199_classes_images:v15/00045a69-b09f-4293-8c2e-a7ba27964fb6.jpg 2 batch size 1
2023-03-07 07:15:56 [DEBUG] Going to run inference batch 0 0 1 start off 2 orig off 2 len 1
2023-03-07 07:15:56 [DEBUG] Run inference artifacts/food_vision_199_classes_images:v15/0004a23a-88b3-4aae-a1b0-c9ebe77d31b8.jpeg 3 batch size 1
2023-03-07 07:15:56 [DEBUG] Going to run inference batch 0 0 1 start off 3 orig off 3 len 1
2023-03-07 07:15:56 [DEBUG] Image load and resize took 0 from artifacts/food_vision_199_classes_images:v15/000226a7-5332-4f45-b0e9-6760e9bd6d3e.jpeg
2023-03-07 07:15:56 [DEBUG] Read image took 0

original  120x120:
[[252, 250, 255], [252, 251, 255], [254, 251, 255]]
[[252, 251, 255], [252, 252, 255], [254, 253, 255]]
[[252, 255, 253], [252, 255, 253], [254, 255, 251]]

resized 224:
[[252, 250, 255], [252, 250, 255], [252, 251, 255]]
[[252, 250, 255], [252, 250, 255], [252, 251, 255]]
[[252, 251, 255], [252, 251, 255], [252, 252, 255]]

RGB:
[[255, 250, 2023-03-07 07:15:56 [DEBUG] Image load and resize took 0 from artifacts/food_vision_199_classes_images:v15/0004a23a-88b3-4aae-a1b0-c9ebe77d31b8.jpeg
252], [255, 250, 252], [255, 2512023-03-07 07:15:56 [DEBUG] Read image took 0
, 
original  352x220:
[[253, 253, 253], [253252]]
[[255, 250, 252], [255, 250, 252], 253, 253], [253, 253, 253]], [255, 251, 252]
[[253]
[[255, , 253, 253], [253, 253251, 252], [255, 251, 252], [255, 252, 252]], 253], [

253, 253, 253]]
[[253, 253, 253]2023-03-07 07:15:56 [DEBUG] Image load and resize took 1 from artifacts/food_vision_199_classes_images:v15/0003c8a1-7f64-4540-9256-3252f0981035.jpeg
2023-03-07 07:15:56 [DEBUG] Read image took 1

original  220x220:
[[14, 18, 17, [253, 253, 253], [253, 253, 253]]

], [14, 20, 19], [17, 25, 24]]
[[15, 19, 18], [14, 20, 19], [16, 25, 23]]
[[16, 21, 20], [14, 20, 19], [15, 23, 22]]

resized 224:
[[253, 253, 253], [253, 253, 253], [253, 253, 253]]
[[253, 253, 253], [253, 253, 253], [253, 253, 253]]
[[253, 253, 253], [253, 253, 253], [253, 253, 253]]

resized 224:
[[14, 18, 17], [14, 20, 19], [17, 25, 24]]
[[15, 19, 18], [14, 20, 19], [16, 25, 23]]
[[16, 21, 20], [14, 20, 19], [15, 23, 22]]

RGB:
[[253, 253, 253], [253, 253, 253], [253, 253, 253]]
[[253, 253, 253], [253, 253, 253], [253, 253, 253]]
[[253, 253, 253
RGB:
], [253, 253, 253], [253, 253, 253]]

[[17, 18, 14], [19, 20, 14], [24, 25, 17]]
[[18, 19, 15], [19, 20, 14], [23, 25, 16]]
[[20, 21, 16], [19, 20, 14], [22, 23, 15]]

2023-03-07 07:15:56 [DEBUG] Computed stats 1443.857178 141.410995 71.452217
2023-03-07 07:15:56 [DEBUG] Image stats vec 0x7f7ece05acd0 0x7f7ece05acd0
0 :[255.0000, 250.0000, 252.0000, 255.0000, 250.0000, 252.0000, 255.0000, 251.0000, 252.0000, 255.0000]
2023-03-07 07:15:56 [DEBUG] Computed stats 506.211670 214.963531 67.210236
2023-03-07 07:15:56 [DEBUG] Image stats vec 0x7f7ece05acd0 0x7f7ece05acd0
0 :[253.0000, 253.0000, 253.0000, 253.0000, 253.0000, 253.0000, 253.0000, 253.0000, 253.0000, 253.0000]
2023-03-07 07:15:56 [DEBUG] Computed stats 899.482483 104.077675 64.384438
2023-03-07 07:15:56 [DEBUG] Image stats vec 0x7f7ece05acd0 0x7f7ece05acd0
0 :[17.0000, 18.0000, 14.0000, 19.0000, 20.0000, 14.0000, 24.0000, 25.0000, 17.0000, 29.0000]
2023-03-07 07:15:56 [DEBUG] Image load and resize took 3 from artifacts/food_vision_199_classes_images:v15/00045a69-b09f-4293-8c2e-a7ba27964fb6.jpg
2023-03-07 07:15:56 [DEBUG] Read image took 3

original  600x450:
[[113, 122, 132], [114, 123, 133], [98, 104, 115]]
[[119, 128, 138], [123, 132, 142], [107, 113, 124]]
[[117, 126, 136], [128, 137, 147], [115, 121, 132]]

resized 224:
[[123, 132, 142], [127, 131, 142], [142, 143, 157]]
[[132, 141, 151], [136, 140, 151], [155, 156, 170]]
[[133, 141, 154], [130, 131, 145], [157, 157, 171]]

RGB:
[[142, 132, 123], [142, 131, 127], [157, 143, 142]]
[[151, 141, 132], [151, 140, 136], [170, 156, 155]]
[[154, 141, 133], [145, 131, 130], [171, 157, 157]]

2023-03-07 07:15:56 [DEBUG] Computed stats 3771.233154 98.068268 57.031754
2023-03-07 07:15:56 [DEBUG] Image stats vec 0x7f7ece05acd0 0x7f7ece05acd0
0 :[142.0000, 132.0000, 123.0000, 142.0000, 131.0000, 127.0000, 157.0000, 143.0000, 142.0000, 161.0000]
2023-03-07 07:15:56 [DEBUG] Inner inference took 5 (test? 0)
output_tensor3 :[-0.0509, 0.8222, 0.3277, -0.1013, -0.1441, 0.2930, 1.7275, 1.3833, 0.5214, 0.1942]
output_tensor_end3 :[-0.1208, -0.0856, 1.3555, 0.6580]
2023-03-07 07:15:56 [DEBUG] Quad array 0x7f7e52d70010 3 start_offset 0 
features3 :[-0.0509, 0.8222, 0.3277, -0.1013]
2023-03-07 07:15:56 [DEBUG] Finished inference fine 3 (test 0)!!
2023-03-07 07:15:56 [DEBUG] Run inference artifacts/food_vision_199_classes_images:v15/00065302-c0e7-4634-ab63-5ddd16bfdeb8.jpeg 4 batch size 1
2023-03-07 07:15:56 [DEBUG] Going to run inference batch 0 0 1 start off 4 orig off 4 len 1
2023-03-07 07:15:56 [DEBUG] Image load and resize took 0 from artifacts/food_vision_199_classes_images:v15/00065302-c0e7-4634-ab63-5ddd16bfdeb8.jpeg
2023-03-07 07:15:56 [DEBUG] Read image took 0

original  340x220:
[[232, 232, 232], [232, 232, 232], [232, 232, 232]]
[[233, 233, 233], [233, 233, 233], [233, 233, 233]]
[[234, 234, 234], [234, 234, 234], [234, 234, 234]]

2023-03-07 07:15:56 [DEBUG] Inner inference took 6 (test? 0)
output_tensor0 :[0.1056, -0.0408, -0.0785, 1.6688, -0.1036, -0.0105, 2.3451, 2.6652, 0.1104, 0.0462]
output_tensor_end0 :[0.6959, -0.0590, 0.8360, 0.0015]
2023-03-07 07:15:56 [DEBUG] Quad array 0x7f7e52d70010 0 start_offset 0 
features0 :[0.1056, -0.0408, -0.0785, 1.6688]
2023-03-07 07:15:56 [DEBUG] Finished inference fine 0 (test 0)!!

resized 224:
[[232, 232, 232], [232, 232, 232], [232, 232, 232]]
[[233, 233, 233], [233, 233, 233], [233, 233, 233]]
[[234, 234, 234], [234, 234, 234], [234, 234, 234]]

RGB:
[[232, 232, 232], [232, 232, 232], [232, 232, 232]]
[[233, 233, 233], [233, 233, 233], [233, 233, 233]]
[[234, 234, 234], [234, 234, 234], [234, 234, 234]]

2023-03-07 07:15:56 [DEBUG] Computed stats 754.797974 148.124588 65.905952
2023-03-07 07:15:56 [DEBUG] Image stats vec 0x7f7ece05acd0 0x7f7ece05acd0
0 :[232.0000, 232.0000, 232.0000, 232.0000, 232.0000, 232.0000, 232.0000, 232.0000, 232.0000, 232.0000]
2023-03-07 07:15:56 [DEBUG] Inner inference took 6 (test? 0)
output_tensor1 :[0.2499, -0.0228, 0.1952, 0.3738, 0.0307, 0.1350, 1.8753, 1.3722, 0.0462, 0.0384]
output_tensor_end1 :[0.4413, -0.0264, 0.8682, 0.0000]
2023-03-07 07:15:56 [DEBUG] Quad array 0x7f7e52d70010 1 start_offset 0 
features1 :[0.2499, -0.0228, 0.1952, 0.3738]
2023-03-07 07:15:56 [DEBUG] Finished inference fine 1 (test 0)!!
2023-03-07 07:15:56 [DEBUG] Inner inference took 6 (test? 0)
output_tensor2 :[0.6552, 1.6231, 0.3641, 2.6880, -0.0520, 0.7673, 3.4906, 0.5311, -0.0925, -0.0460]
output_tensor_end2 :[0.0801, 0.3410, 0.1593, 0.1368]
2023-03-07 07:15:56 [DEBUG] Quad array 0x7f7e52d70010 2 start_offset 0 
features2 :[0.6552, 1.6231, 0.3641, 2.6880]
2023-03-07 07:15:56 [DEBUG] Finished inference fine 2 (test 0)!!
2023-03-07 07:15:56 [DEBUG] Inner inference took 5 (test? 0)
output_tensor4 :[2.7336, 0.0287, 0.7808, -0.0400, -0.0215, 0.7799, 0.2430, 1.2999, 0.7080, 0.1600]
output_tensor_end4 :[-0.0077, -0.0580, 0.6903, 0.5516]
2023-03-07 07:15:56 [DEBUG] Quad array 0x7f7e52d70010 4 start_offset 0 
features4 :[2.7336, 0.0287, 0.7808, -0.0400]
2023-03-07 07:15:56 [DEBUG] Finished inference fine 4 (test 0)!!
[■                                                 ] 2% Estimated: 6 Minutes 0 Features

The error message in the Jupyter Notebook from VS Code directs me to this link: https://github.com/microsoft/vscode-jupyter/wiki/Kernel-crashes

Does fastdup require a specific version of NumPy? Potentially that's causing my issue.

Environment:

Thank you again

mrdbourke commented 1 year ago

Update:

Tried running this as a Python script (same env).

Similar error occurred:

(/home/daniel/code/pytorch/env) daniel@daniel-Z490-UD:~/code/nutrify/foodvision$ python find_duplicates.py 
[INFO] Finding duplicate images in: ./notebooks/artifacts/food_vision_199_classes_images:v15
[INFO] Number of images: 117574
FastDup Software, (C) copyright 2022 Dr. Amir Alush and Dr. Danny Bickson.
2023-03-07 09:34:32 [INFO] Version 0.214 Release compiled on Mar  6 2023 07:23:12
2023-03-07 09:34:32 [DEBUG] Read model file UndisclosedFastdupModel.ort input layer name input output layer name global_average_pooling2d_12
2023-03-07 09:34:32 [DEBUG] out_dims[0] = -1
2023-03-07 09:34:32 [DEBUG] out_dims[1] = 576
2023-03-07 09:34:32 [DEBUG] Model dimensions are -1 224 224 3 is_nchw? 0 is b/w? 0
2023-03-07 09:34:32 [INFO] Going to loop over dir notebooks/artifacts/food_vision_199_classes_images:v15
2023-03-07 09:34:33 [DEBUG] find -L notebooks/artifacts/food_vision_199_classes_images:v15 -type f | egrep -i '\.bmp$|\.jpg$|\.tiff$|\.giff$|\.jpeg$|\.png$|\.tif$|\.tar$|\.tar.gz$|\.zip$|\.tgz$|\.mp4$|\.avi$|\.heif$|\.heic$'| sort > duplicates/tmp/files0.txt
2023-03-07 09:34:34 [DEBUG] Read a total of 117574 lines from duplicates/tmp/files0.txt
2023-03-07 09:34:34 [DEBUG] Total images read so far 117574
2023-03-07 09:34:34 [INFO] Found total 117574 images to run on
2023-03-07 09:34:34 [DEBUG] Going to init pool
2023-03-07 09:34:34 [DEBUG] Starting to run with 4 threads
2023-03-07 09:34:34 [DEBUG] Going to init quad array of size 117574
2023-03-07 09:34:34 [DEBUG] Going to init jobs
2023-03-07 09:34:34 [DEBUG] Run inference notebooks/artifacts/food_vision_199_classes_images:v15/000226a7-5332-4f45-b0e9-6760e9bd6d3e.jpeg 0 batch size 1
2023-03-07 09:34:34 [DEBUG] Going to run inference batch 0 0 1 start off 0 orig off 0 len 1
2023-03-07 09:34:34 [DEBUG] Run inference notebooks/artifacts/food_vision_199_classes_images:v15/0003c8a1-7f64-4540-9256-3252f0981035.jpeg 1 batch size 1
2023-03-07 09:34:34 [DEBUG] Going to run inference batch 0 0 1 start off 1 orig off 1 len 1
2023-03-07 09:34:34 [DEBUG] Run inference notebooks/artifacts/food_vision_199_classes_images:v15/00045a69-b09f-4293-8c2e-a7ba27964fb6.jpg 2 batch size 1
2023-03-07 09:34:34 [DEBUG] Going to run inference batch 0 0 1 start off 2 orig off 2 len 1
2023-03-07 09:34:34 [DEBUG] Run inference notebooks/artifacts/food_vision_199_classes_images:v15/0004a23a-88b3-4aae-a1b0-c9ebe77d31b8.jpeg 3 batch size 1
2023-03-07 09:34:34 [DEBUG] Going to run inference batch 0 0 1 start off 3 orig off 3 len 1
2023-03-07 09:34:34 [DEBUG] Image load and resize took 0 from notebooks/artifacts/food_vision_199_classes_images:v15/000226a7-5332-4f45-b0e9-6760e9bd6d3e.jpeg
2023-03-07 09:34:34 [DEBUG] Read image took 0

original  120x120:
[[252, 250, 255], [252, 251, 255], [254, 251, 255]]
[[252, 251, 255], [252, 252, 255], [254, 253, 255]]
[[252, 255, 253], [252, 255, 253], [254, 255, 251]]

resized 224:
[[252, 250, 255], [252, 250, 255], [252, 251, 255]]
[[252, 250, 255], [252, 250, 255], [252, 251, 255]]
[[252, 251, 255], [252, 251, 255], [252, 252, 255]]

RGB:
[[255, 250, 252], [255, 250, 252], [255, 251, 252]]
[[255, 250, 252], [255, 250, 252], [255, 251, 252]]
[[255, 251, 252], [255, 251, 252], [255, 252, 252]]

2023-03-07 09:34:34 [DEBUG] Image load and resize took 0 from notebooks/artifacts/food_vision_199_classes_images:v15/0003c8a1-7f64-4540-9256-3252f0981035.jpeg
2023-03-07 09:34:34 [DEBUG] Read image took 0

original  220x220:
[[14, 18, 17], [14, 20, 19], [17, 25, 24]]
[[15, 19, 18], [14, 20, 19], [16, 25, 23]]
2023-03-07 09:34:34 [DEBUG] Image load and resize took 0 from notebooks/artifacts/food_vision_199_classes_images:v15/0004a23a-88b3-4aae-a1b0-c9ebe77d31b8.jpeg
[[16, 21, 20], [14, 20, 19], 2023-03-07 09:34:34 [DEBUG] Read image took 0
[15, 
original  352x22023, :22
[[253, 253, 253], [253, 253, 253], [253, ]253]

, 253]]
[[253, 253, 253], [253, 253, 253], [253, 253, 253]]
[[253, 253, 253], [253, 253, 253], [253, 253, 253]]

resized 224:
[[14, 18, 17], [14, 20, 19], [17, 25, 24]]
[[15, 19, 18], [14, 20, 19], [16, 25, 23]]

resized 224:
[[[[16, 21, 20], [14, 20, 19], [15, 23, 22]]

253, 253, 253], [253, 253, 253], [253, 253, 253]]
[[253, 253, 253], [253, 253, 253], [253, 253, 253]]
[[253, 253, 253], [253, 253, 253], [253, 253, 253]]

RGB:
[[17, 18, 14], [19, 20, 14], [24, 25, 17]]
[[18, 19, 15], [19, 20, 14], [23, 25, 16]]
[[20, 21, 16], [19, 20, 14], [22, 23, 15]]

RGB:
[[253, 253, 253], [253, 253, 253], [253, 253, 253]]
[[253, 253, 253], [253, 253, 253], [253, 253, 253]]
[[253, 253, 253], [253, 253, 253], [253, 253, 253]]

2023-03-07 09:34:34 [DEBUG] Computed stats 1443.857178 141.410995 71.452217
2023-03-07 09:34:34 [DEBUG] Image stats vec 0x7fc45cb9fcd0 0x7fc45cb9fcd0
0 :[255.0000, 250.0000, 252.0000, 255.0000, 250.0000, 252.0000, 255.0000, 251.0000, 252.0000, 255.0000]
2023-03-07 09:34:34 [DEBUG] Computed stats 899.482483 104.077675 64.384438
2023-03-07 09:34:34 [DEBUG] Image stats vec 0x7fc45cb9fcd0 0x7fc45cb9fcd0
0 :[17.0000, 18.0000, 14.0000, 19.0000, 20.0000, 14.0000, 24.0000, 25.0000, 17.0000, 29.0000]
2023-03-07 09:34:34 [DEBUG] Computed stats 506.211670 214.963531 67.210236
2023-03-07 09:34:34 [DEBUG] Image stats vec 0x7fc45cb9fcd0 0x7fc45cb9fcd0
0 :[253.0000, 253.0000, 253.0000, 253.0000, 253.0000, 253.0000, 253.0000, 253.0000, 253.0000, 253.0000]
2023-03-07 09:34:34 [DEBUG] Image load and resize took 2 from notebooks/artifacts/food_vision_199_classes_images:v15/00045a69-b09f-4293-8c2e-a7ba27964fb6.jpg
2023-03-07 09:34:34 [DEBUG] Read image took 2

original  600x450:
[[113, 122, 132], [114, 123, 133], [98, 104, 115]]
[[119, 128, 138], [123, 132, 142], [107, 113, 124]]
[[117, 126, 136], [128, 137, 147], [115, 121, 132]]

resized 224:
[[123, 132, 142], [127, 131, 142], [142, 143, 157]]
[[132, 141, 151], [136, 140, 151], [155, 156, 170]]
[[133, 141, 154], [130, 131, 145], [157, 157, 171]]

RGB:
[[142, 132, 123], [142, 131, 127], [157, 143, 142]]
[[151, 141, 132], [151, 140, 136], [170, 156, 155]]
[[154, 141, 133], [145, 131, 130], [171, 157, 157]]

2023-03-07 09:34:34 [DEBUG] Computed stats 3771.233154 98.068268 57.031754
2023-03-07 09:34:34 [DEBUG] Image stats vec 0x7fc45cb9fcd0 0x7fc45cb9fcd0
0 :[142.0000, 132.0000, 123.0000, 142.0000, 131.0000, 127.0000, 157.0000, 143.0000, 142.0000, 161.0000]
2023-03-07 09:34:34 [DEBUG] Inner inference took 5 (test? 0)
output_tensor0 :[0.1056, -0.0408, -0.0785, 1.6688, -0.1036, -0.0105, 2.3451, 2.6652, 0.1104, 0.0462]
output_tensor_end0 :[0.6959, -0.0590, 0.8360, 0.0015]
2023-03-07 09:34:34 [DEBUG] Quad array 0x7fc3ebda8010 0 start_offset 0 
features0 :[0.1056, -0.0408, -0.0785, 1.6688]
2023-03-07 09:34:34 [DEBUG] Finished inference fine 0 (test 0)!!
2023-03-07 09:34:34 [DEBUG] Run inference notebooks/artifacts/food_vision_199_classes_images:v15/00065302-c0e7-4634-ab63-5ddd16bfdeb8.jpeg 4 batch size 1
2023-03-07 09:34:34 [DEBUG] Going to run inference batch 0 0 1 start off 4 orig off 4 len 1
2023-03-07 09:34:34 [DEBUG] Image load and resize took 0 from notebooks/artifacts/food_vision_199_classes_images:v15/00065302-c0e7-4634-ab63-5ddd16bfdeb8.jpeg
2023-03-07 09:34:34 [DEBUG] Read image took 0

original  340x220:
[[232, 232, 232], [232, 232, 232], [232, 232, 232]]
[[233, 233, 233], [233, 233, 233], [233, 233, 233]]
[[234, 234, 234], [234, 234, 234], [234, 234, 234]]

resized 224:
[[232, 232, 232], [232, 232, 232], [232, 232, 232]]
[[233, 233, 233], [233, 233, 233], [233, 233, 233]]
[[234, 234, 234], [234, 234, 234], [234, 234, 234]]

RGB:
[[232, 232, 232], [232, 232, 232], [232, 232, 232]]
[[233, 233, 233], [233, 233, 233], [233, 233, 233]]
[[234, 234, 234], [234, 234, 234], [234, 234, 234]]

2023-03-07 09:34:34 [DEBUG] Computed stats 754.797974 148.124588 65.905952
2023-03-07 09:34:34 [DEBUG] Image stats vec 0x7fc45cb9fcd0 0x7fc45cb9fcd0
0 :[232.0000, 232.0000, 232.0000, 232.0000, 232.0000, 232.0000, 232.0000, 232.0000, 232.0000, 232.0000]
2023-03-07 09:34:34 [DEBUG] Inner inference took 6 (test? 0)
output_tensor1 :[0.2499, -0.0228, 0.1952, 0.3738, 0.0307, 0.1350, 1.8753, 1.3722, 0.0462, 0.0384]
output_tensor_end1 :[0.4413, -0.0264, 0.8682, 0.0000]
2023-03-07 09:34:34 [DEBUG] Quad array 0x7fc3ebda8010 1 start_offset 0 
features1 :[0.2499, -0.0228, 0.1952, 0.3738]
2023-03-07 09:34:34 [DEBUG] Finished inference fine 1 (test 0)!!
2023-03-07 09:34:34 [DEBUG] Inner inference took 6 (test? 0)
output_tensor3 :[-0.0509, 0.8222, 0.3277, -0.1013, -0.1441, 0.2930, 1.7275, 1.3833, 0.5214, 0.1942]
output_tensor_end3 :[-0.1208, -0.0856, 1.3555, 0.6580]
2023-03-07 09:34:34 [DEBUG] Quad array 0x7fc3ebda8010 3 start_offset 0 
features3 :[-0.0509, 0.8222, 0.3277, -0.1013]
2023-03-07 09:34:34 [DEBUG] Finished inference fine 3 (test 0)!!
2023-03-07 09:34:34 [DEBUG] Inner inference took 6 (test? 0)
output_tensor2 :[0.6552, 1.6231, 0.3641, 2.6880, -0.0520, 0.7673, 3.4906, 0.5311, -0.0925, -0.0460]
output_tensor_end2 :[0.0801, 0.3410, 0.1593, 0.1368]
2023-03-07 09:34:34 [DEBUG] Quad array 0x7fc3ebda8010 2 start_offset 0 
features2 :[0.6552, 1.6231, 0.3641, 2.6880]
2023-03-07 09:34:34 [DEBUG] Finished inference fine 2 (test 0)!!
2023-03-07 09:34:34 [DEBUG] Inner inference took 5 (test? 0)
output_tensor4 :[2.7336, 0.0287, 0.7808, -0.0400, -0.0215, 0.7799, 0.2430, 1.2999, 0.7080, 0.1600]
output_tensor_end4 :[-0.0077, -0.0580, 0.6903, 0.5516]
2023-03-07 09:34:34 [DEBUG] Quad array 0x7fc3ebda8010 4 start_offset 0 
features4 :[2.7336, 0.0287, 0.7808, -0.0400]
2023-03-07 09:34:34 [DEBUG] Finished inference fine 4 (test 0)!!
libpng warning: iCCP: known incorrect sRGB profile ] 2% Estimated: 6 Minutes 0 Features
*** buffer overflow detected ***: terminated       ] 5% Estimated: 6 Minutes 0 Features
Aborted (core dumped)                              ] 5% Estimated: 6 Minutes 0 Features
mrdbourke commented 1 year ago

Update:

Seems to happen within a Python script + a fresh Python env.

python3 -m venv env
source env/bin/activate
pip install -U pip
pip install fastdup

Same error as above.

Potentially there's something wrong with my C installation? (I'm guessing here, not too familiar with what fastdup runs on the backend)

I'm running on Ubuntu 20.04

dbickson commented 1 year ago

Hi @mrdbourke thanks for reaching out, and your kind words, we would love to help. This error is new to us. It looks like it originates on the C++ side of our code. Can you run with num_threads=1,run_stats=0 and let us know if this error happens always at 5% of the progress. It may be related to a corrupted image that somehow crashes the code. We have recently added support for 16 bit grayscale images. A rouge image may have sneaked in. How many cores and how much RAM do you have? Is your dataset sharable ? If we can get it we can reproduce it on our side.

mrdbourke commented 1 year ago

Hi @dbickson,

Thank you for the quick help!

I just tried running with the parameters you said and the issue keeps happening.

Error output:

[INFO] Finding duplicate images in: ./notebooks/artifacts/food_vision_199_classes_images:v16
[INFO] Number of images: 117568
FastDup Software, (C) copyright 2022 Dr. Amir Alush and Dr. Danny Bickson.
2023-03-07 18:46:43 [INFO] Version 0.214 Release compiled on Mar  6 2023 07:23:12
2023-03-07 18:46:43 [DEBUG] Read model file UndisclosedFastdupModel.ort input layer name input output layer name global_average_pooling2d_12
2023-03-07 18:46:43 [DEBUG] out_dims[0] = -1
2023-03-07 18:46:43 [DEBUG] out_dims[1] = 576
2023-03-07 18:46:43 [DEBUG] Model dimensions are -1 224 224 3 is_nchw? 0 is b/w? 0
2023-03-07 18:46:43 [INFO] Going to loop over dir notebooks/artifacts/food_vision_199_classes_images:v16
2023-03-07 18:46:44 [DEBUG] find -L notebooks/artifacts/food_vision_199_classes_images:v16 -type f | egrep -i '\.bmp$|\.jpg$|\.tiff$|\.giff$|\.jpeg$|\.png$|\.tif$|\.tar$|\.tar.gz$|\.zip$|\.tgz$|\.mp4$|\.avi$|\.heif$|\.heic$'| sort > duplicates/tmp/files0.txt
2023-03-07 18:46:45 [DEBUG] Read a total of 117568 lines from duplicates/tmp/files0.txt
2023-03-07 18:46:45 [DEBUG] Total images read so far 117568
2023-03-07 18:46:45 [INFO] Found total 117568 images to run on
2023-03-07 18:46:45 [DEBUG] Going to init pool
2023-03-07 18:46:45 [DEBUG] Starting to run with 1 threads
2023-03-07 18:46:45 [DEBUG] Going to init quad array of size 117568
2023-03-07 18:46:45 [DEBUG] Going to init jobs
2023-03-07 18:46:45 [DEBUG] Run inference notebooks/artifacts/food_vision_199_classes_images:v16/000226a7-5332-4f45-b0e9-6760e9bd6d3e.jpeg 0 batch size 1
2023-03-07 18:46:45 [DEBUG] Going to run inference batch 0 0 1 start off 0 orig off 0 len 1
2023-03-07 18:46:45 [DEBUG] Image load and resize took 0 from notebooks/artifacts/food_vision_199_classes_images:v16/000226a7-5332-4f45-b0e9-6760e9bd6d3e.jpeg
2023-03-07 18:46:45 [DEBUG] Read image took 0

original  120x120:
[[252, 250, 255], [252, 251, 255], [254, 251, 255]]
[[252, 251, 255], [252, 252, 255], [254, 253, 255]]
[[252, 255, 253], [252, 255, 253], [254, 255, 251]]

resized 224:
[[252, 250, 255], [252, 250, 255], [252, 251, 255]]
[[252, 250, 255], [252, 250, 255], [252, 251, 255]]
[[252, 251, 255], [252, 251, 255], [252, 252, 255]]

RGB:
[[255, 250, 252], [255, 250, 252], [255, 251, 252]]
[[255, 250, 252], [255, 250, 252], [255, 251, 252]]
[[255, 251, 252], [255, 251, 252], [255, 252, 252]]

0 :[255.0000, 250.0000, 252.0000, 255.0000, 250.0000, 252.0000, 255.0000, 251.0000, 252.0000, 255.0000]
2023-03-07 18:46:45 [DEBUG] Inner inference took 6 (test? 0)
output_tensor0 :[0.1056, -0.0408, -0.0785, 1.6688, -0.1036, -0.0105, 2.3451, 2.6652, 0.1104, 0.0462]
output_tensor_end0 :[0.6959, -0.0590, 0.8360, 0.0015]
2023-03-07 18:46:45 [DEBUG] Quad array 0x7fda33dab010 0 start_offset 0 
features0 :[0.1056, -0.0408, -0.0785, 1.6688]
2023-03-07 18:46:45 [DEBUG] Finished inference fine 0 (test 0)!!
2023-03-07 18:46:45 [DEBUG] Run inference notebooks/artifacts/food_vision_199_classes_images:v16/0003a069-3b76-4cae-9414-80ccaa081e80.jpeg 1 batch size 1
2023-03-07 18:46:45 [DEBUG] Going to run inference batch 0 0 1 start off 1 orig off 1 len 1
2023-03-07 18:46:45 [DEBUG] Image load and resize took 0 from notebooks/artifacts/food_vision_199_classes_images:v16/0003a069-3b76-4cae-9414-80ccaa081e80.jpeg
2023-03-07 18:46:45 [DEBUG] Read image took 0

original  288x175:
[[126, 152, 192], [126, 152, 192], [128, 155, 192]]
[[119, 145, 185], [120, 146, 186], [121, 148, 185]]
[[112, 138, 178], [112, 138, 178], [114, 141, 178]]

resized 224:
[[126, 152, 192], [126, 152, 192], [130, 157, 194]]
[[119, 145, 185], [120, 146, 186], [123, 150, 187]]
[[119, 145, 185], [120, 146, 186], [123, 150, 187]]

RGB:
[[192, 152, 126], [192, 152, 126], [194, 157, 130]]
[[185, 145, 119], [186, 146, 120], [187, 150, 123]]
[[185, 145, 119], [186, 146, 120], [187, 150, 123]]

0 :[192.0000, 152.0000, 126.0000, 192.0000, 152.0000, 126.0000, 194.0000, 157.0000, 130.0000, 195.0000]
2023-03-07 18:46:45 [DEBUG] Inner inference took 6 (test? 0)
output_tensor1 :[0.1467, 0.1084, 0.0199, 2.3778, -0.0718, 0.0027, 0.1834, 1.3729, 0.4672, 0.0633]
output_tensor_end1 :[0.5904, 0.2958, 0.7220, 0.0407]
2023-03-07 18:46:45 [DEBUG] Quad array 0x7fda33dab010 1 start_offset 0 
features1 :[0.1467, 0.1084, 0.0199, 2.3778]
2023-03-07 18:46:45 [DEBUG] Finished inference fine 1 (test 0)!!
2023-03-07 18:46:45 [DEBUG] Run inference notebooks/artifacts/food_vision_199_classes_images:v16/0003c8a1-7f64-4540-9256-3252f0981035.jpeg 2 batch size 1
2023-03-07 18:46:45 [DEBUG] Going to run inference batch 0 0 1 start off 2 orig off 2 len 1
2023-03-07 18:46:45 [DEBUG] Image load and resize took 0 from notebooks/artifacts/food_vision_199_classes_images:v16/0003c8a1-7f64-4540-9256-3252f0981035.jpeg
2023-03-07 18:46:45 [DEBUG] Read image took 0

original  220x220:
[[14, 18, 17], [14, 20, 19], [17, 25, 24]]
[[15, 19, 18], [14, 20, 19], [16, 25, 23]]
[[16, 21, 20], [14, 20, 19], [15, 23, 22]]

resized 224:
[[14, 18, 17], [14, 20, 19], [17, 25, 24]]
[[15, 19, 18], [14, 20, 19], [16, 25, 23]]
[[16, 21, 20], [14, 20, 19], [15, 23, 22]]

RGB:
[[17, 18, 14], [19, 20, 14], [24, 25, 17]]
[[18, 19, 15], [19, 20, 14], [23, 25, 16]]
[[20, 21, 16], [19, 20, 14], [22, 23, 15]]

0 :[17.0000, 18.0000, 14.0000, 19.0000, 20.0000, 14.0000, 24.0000, 25.0000, 17.0000, 29.0000]
2023-03-07 18:46:45 [DEBUG] Inner inference took 6 (test? 0)
output_tensor2 :[0.2499, -0.0228, 0.1952, 0.3738, 0.0307, 0.1350, 1.8753, 1.3722, 0.0462, 0.0384]
output_tensor_end2 :[0.4413, -0.0264, 0.8682, 0.0000]
2023-03-07 18:46:45 [DEBUG] Quad array 0x7fda33dab010 2 start_offset 0 
features2 :[0.2499, -0.0228, 0.1952, 0.3738]
2023-03-07 18:46:45 [DEBUG] Finished inference fine 2 (test 0)!!
2023-03-07 18:46:45 [DEBUG] Run inference notebooks/artifacts/food_vision_199_classes_images:v16/00045a69-b09f-4293-8c2e-a7ba27964fb6.jpg 3 batch size 1
2023-03-07 18:46:45 [DEBUG] Going to run inference batch 0 0 1 start off 3 orig off 3 len 1
2023-03-07 18:46:45 [DEBUG] Image load and resize took 2 from notebooks/artifacts/food_vision_199_classes_images:v16/00045a69-b09f-4293-8c2e-a7ba27964fb6.jpg
2023-03-07 18:46:45 [DEBUG] Read image took 2

original  600x450:
[[113, 122, 132], [114, 123, 133], [98, 104, 115]]
[[119, 128, 138], [123, 132, 142], [107, 113, 124]]
[[117, 126, 136], [128, 137, 147], [115, 121, 132]]

resized 224:
[[123, 132, 142], [127, 131, 142], [142, 143, 157]]
[[132, 141, 151], [136, 140, 151], [155, 156, 170]]
[[133, 141, 154], [130, 131, 145], [157, 157, 171]]

RGB:
[[142, 132, 123], [142, 131, 127], [157, 143, 142]]
[[151, 141, 132], [151, 140, 136], [170, 156, 155]]
[[154, 141, 133], [145, 131, 130], [171, 157, 157]]

0 :[142.0000, 132.0000, 123.0000, 142.0000, 131.0000, 127.0000, 157.0000, 143.0000, 142.0000, 161.0000]
2023-03-07 18:46:45 [DEBUG] Inner inference took 6 (test? 0)
output_tensor3 :[0.6552, 1.6231, 0.3641, 2.6880, -0.0520, 0.7673, 3.4906, 0.5311, -0.0925, -0.0460]
output_tensor_end3 :[0.0801, 0.3410, 0.1593, 0.1368]
2023-03-07 18:46:45 [DEBUG] Quad array 0x7fda33dab010 3 start_offset 0 
features3 :[0.6552, 1.6231, 0.3641, 2.6880]
2023-03-07 18:46:45 [DEBUG] Finished inference fine 3 (test 0)!!
2023-03-07 18:46:45 [DEBUG] Run inference notebooks/artifacts/food_vision_199_classes_images:v16/0004a23a-88b3-4aae-a1b0-c9ebe77d31b8.jpeg 4 batch size 1
2023-03-07 18:46:45 [DEBUG] Going to run inference batch 0 0 1 start off 4 orig off 4 len 1
2023-03-07 18:46:45 [DEBUG] Image load and resize took 0 from notebooks/artifacts/food_vision_199_classes_images:v16/0004a23a-88b3-4aae-a1b0-c9ebe77d31b8.jpeg
2023-03-07 18:46:45 [DEBUG] Read image took 0

original  352x220:
[[253, 253, 253], [253, 253, 253], [253, 253, 253]]
[[253, 253, 253], [253, 253, 253], [253, 253, 253]]
[[253, 253, 253], [253, 253, 253], [253, 253, 253]]

resized 224:
[[253, 253, 253], [253, 253, 253], [253, 253, 253]]
[[253, 253, 253], [253, 253, 253], [253, 253, 253]]
[[253, 253, 253], [253, 253, 253], [253, 253, 253]]

RGB:
[[253, 253, 253], [253, 253, 253], [253, 253, 253]]
[[253, 253, 253], [253, 253, 253], [253, 253, 253]]
[[253, 253, 253], [253, 253, 253], [253, 253, 253]]

0 :[253.0000, 253.0000, 253.0000, 253.0000, 253.0000, 253.0000, 253.0000, 253.0000, 253.0000, 253.0000]
2023-03-07 18:46:45 [DEBUG] Inner inference took 5 (test? 0)
output_tensor4 :[-0.0509, 0.8222, 0.3277, -0.1013, -0.1441, 0.2930, 1.7275, 1.3833, 0.5214, 0.1942]
output_tensor_end4 :[-0.1208, -0.0856, 1.3555, 0.6580]
2023-03-07 18:46:45 [DEBUG] Quad array 0x7fda33dab010 4 start_offset 0 
features4 :[-0.0509, 0.8222, 0.3277, -0.1013]
2023-03-07 18:46:45 [DEBUG] Finished inference fine 4 (test 0)!!
libpng warning: iCCP: known incorrect sRGB profile ] 2% Estimated: 24 Minutes 0 Features
*** buffer overflow detected ***: terminated       ] 5% Estimated: 23 Minutes 0 Features

I can confirm it almost always happens at 5% (so your idea of the rogue image may be correct).

That being said I just cleaned up the images with:

from PIL import Image
import os
from tqdm.auto import tqdm

folder_path = "target_dir"

file_list = os.listdir(folder_path)
print(f"[INFO] Number of files: {len(file_list)}")

corrupt_files = []
small_files = []

for filename in tqdm(file_list):
    file_path = os.path.join(folder_path, filename)
    try:
        with Image.open(file_path) as img:
            img.verify()
            # img.load()
    except Exception as e:
        print(f"{file_path} is corrupt")
        corrupt_files.append(file_path)

    # If the file is less than 1kb, it's probably corrupt
    if os.path.getsize(file_path) < 1000:
        print(f"{file_path} is too small, adding to small files")
        small_files.append(file_path)

This removed ~6 images that were less than 1kb.

Perhaps I should increase the threshold.

Could a small image be causing it?

My computer stats:

As for the dataset, I'm unable to share it.

Perhaps there are other checks I could run?

mrdbourke commented 1 year ago

Update:

I just removed ~1000 small files (under 4kb) and the issue still happens at 5%.

This is strange 🤔

I'm starting to think it may be an issue with my C/C++ installation (I'm not sure when/how I've done this).

Because I can train a full PyTorch model on these images with no issues.

dbickson commented 1 year ago

Hi @mrdbourke thanks for looking into this. You can use num_images=xxx with num_threads=1 to poinpoint the problematic image. Assume the dataset size is around 100K, 5% is around 5000, try to run with num_images=5000,num_threds=1 and add or substract a little bit until we find the bad image. The file work_dir/atrain_features.dat.csv contains the image filename and indexes, if you can share with us the bad image we will test it on ubuntu on our side. Apologies again for this issue. Your help is critical to making our tool better!

dbickson commented 1 year ago

p.s. You are opening the image with PIL.Image but we use opencv cv2, try to run cv2.imread() on those images to make sure nothing is wrong on cv2.

mrdbourke commented 1 year ago

Hi @dbickson,

I'm running some more experiments to try and track down the issue here.

I verified all images work with CV2:

Screenshot 2023-03-08 at 9 30 43 am

I also tried to run a random subset of images (10,000/116,000) and the error seemed to again (although at a different percentage):

Screenshot 2023-03-08 at 9 32 55 am

Now I've created the ultimate hack:

This is quite slow compared to going over every image in a single folder but it's a way to search every image and try find the broken one.

Let's see what happens.

mrdbourke commented 1 year ago

Ok I found a target image which may be the case: cd6bcc10-0084-41bf-9063-8606453f222f.jpeg

I deleted it...

But now something even stranger is happening.

Whenever I run fastdup straight out of the box it now errors almost immediately:

Screenshot 2023-03-08 at 10 34 52 am

This happens on Python venv too (not just Conda):

Screenshot 2023-03-08 at 10 38 20 am

Maybe I should reinstall C/C++?

🤔

I'm a bit stumped here haha

I think the images should be ok, they are able to be used 100% in training a PyTorch model with no issues.

Perhaps I restart my computer and see what happens.

mrdbourke commented 1 year ago

Update:

Seems to crash immediately every time now (it won't even compute on the first few images).

dbickson commented 1 year ago

hi @mrdbourke I love your systematic approach for looking into this problem and we are committed to help solve it. Regarding the broken image that fails fastdup run, can you please send us the image so we could debug and add a fix. Can you please run with verbose=True and send us the full output. Does the crash happen also when you are on a couple of images? You can run with num_images=10 and see if it works.

mrdbourke commented 1 year ago

Hi @dbickson,

I'm keen to figure this out! Since I've used fastdup before but I'm not sure what happened with my new(er) dataset.

The "broken" image from before (however I don't think it is this image in particular, I think it may be a deeper fault on my system): https://www.dropbox.com/s/ox5fz1pxja9pc6l/cd6bcc10-0084-41bf-9063-8606453f222f.jpeg

I've been exploring Python troubleshooting methods.

Running this code from: https://stackoverflow.com/a/60414546/7900723

import faulthandler

faulthandler.enable()

### Code that will error ###
if __name__ == "__main__":
    fd = fastdup.create(work_dir=output_dir,
                        input_dir=images_dir)

    fd.run(num_threads=1,
        threshold=0.96, # make this arg
        compute="cpu",
        verbose=True,
        run_stats=0,
        num_images=10)

Output from above:

Screenshot 2023-03-09 at 8 31 32 am

File "find_duplicates.py", line 79 in <module> is where I call fd.run().

I can confirm that the issue is happening regardless of the number of images I'm running on.

I just installed a new NVIDIA RTX 4080 GPU + reinstalled CUDA drivers, but if fastdup runs on CPU, surely this isn't the issue?

dbickson commented 1 year ago

Hi @mrdbourke I verified the melon image you sent works in our side. I believe NVIDIA installed changed some of the system deps in a way that crashes fastdup. Do you have access to a fresh ubuntu install (for example on ec2) or a docker and try running there?

mrdbourke commented 1 year ago

Hi @dbickson,

Thank you for that.

Yes, I didn't think that image in particular would be the issue.

I haven't given up!

I played around with some CUDA installs/packages/cudnn versions etc and have gotten further than before:

[INFO] Finding duplicate images in: ./artifacts/food_vision_199_classes_images:v19
[INFO] Number of images: 139522
FastDup Software, (C) copyright 2022 Dr. Amir Alush and Dr. Danny Bickson.
2023-03-14 09:10:42 [INFO] Version 0.901 Release compiled on Mar  8 2023 20:18:05
2023-03-14 09:10:42 [DEBUG] Read model file UndisclosedFastdupModel.ort input layer name input output layer name global_average_pooling2d_12
2023-03-14 09:10:42 [DEBUG] out_dims[0] = -1
2023-03-14 09:10:42 [DEBUG] out_dims[1] = 576
2023-03-14 09:10:42 [DEBUG] Model dimensions are -1 224 224 3 is_nchw? 0 is b/w? 0
2023-03-14 09:10:42 [INFO] Going to loop over dir artifacts/food_vision_199_classes_images:v19
2023-03-14 09:10:43 [DEBUG] find -L artifacts/food_vision_199_classes_images:v19 -type f | egrep -i '\.bmp$|\.jpg$|\.tiff$|\.giff$|\.jpeg$|\.png$|\.tif$|\.tar$|\.tar.gz$|\.zip$|\.tgz$|\.mp4$|\.avi$|\.heif$|\.heic$'| sort > duplicates/tmp/files0.txt
2023-03-14 09:10:43 [DEBUG] Read a total of 139522 lines from duplicates/tmp/files0.txt
2023-03-14 09:10:43 [DEBUG] Total images read so far 139522
2023-03-14 09:10:43 [INFO] Found total 139522 images to run on
2023-03-14 09:10:43 [DEBUG] Going to init pool
2023-03-14 09:10:43 [DEBUG] Starting to run with 4 threads
2023-03-14 09:10:43 [DEBUG] Going to init quad array of size 139522
2023-03-14 09:10:43 [DEBUG] Going to init jobs
2023-03-14 09:10:43 [DEBUG] Run inference artifacts/food_vision_199_classes_images:v19/000000000.jpg 0 batch size 1
2023-03-14 09:10:43 [DEBUG] Going to run inference batch 0 0 1 start off 0 orig off 0 len 1
2023-03-14 09:10:43 [DEBUG] Run inference artifacts/food_vision_199_classes_images:v19/000000003.jpg 1 batch size 1
2023-03-14 09:10:43 [DEBUG] Going to run inference batch 0 0 1 start off 1 orig off 1 len 1
2023-03-14 09:10:43 [DEBUG] Run inference artifacts/food_vision_199_classes_images:v19/000000005.jpg 2 batch size 1
2023-03-14 09:10:43 [DEBUG] Going to run inference batch 0 0 1 start off 2 orig off 2 len 1
2023-03-14 09:10:43 [DEBUG] Run inference artifacts/food_vision_199_classes_images:v19/000000006.jpg 3 batch size 1
2023-03-14 09:10:43 [DEBUG] Going to run inference batch 0 0 1 start off 3 orig off 3 len 1
2023-03-14 09:10:43 [DEBUG] Image load and resize took 0 from artifacts/food_vision_199_classes_images:v19/000000003.jpg
2023-03-14 09:10:43 [DEBUG] Read image took 0

original  300x197:
[[183, 203, 221], [183, 203, 221], [183, 203, 221]]
[[186, 206, 224], [185, 205, 223], [184, 204, 222]]
[[188, 208, 226], [187, 207, 225], [186, 206, 224]]

2023-03-14 09:10:43 [DEBUG] Image load and resize took 0 from artifacts/food_vision_199_classes_images:v19/000000005.jpg
2023-03-14 09:10:43 [DEBUG] Image load and resize took 0 from artifacts/food_vision_199_classes_images:v19/000000000.jpg
2023-03-14 09:10:43 [DEBUG] Read image took 0
2023-03-14 09:10:43 [DEBUG] Read image took 0

original  300x200:
[[59, 43, original  300x20037:]
[[171, 199, 234], [172, 200, 235], [172, 200, 235]]
[, [[17255, , 200, 23539], , [33172, 200], 234], , [51[, 172, 35200, , 29235]]]]
[[173, 200, 234], [173, 201, 232], [173, 200, 234]]

[[59, 43, 37], [60, 44, 38], [62, 46, 40]]
[[55, 39, 33], [57, 41, 35], [61, 45, 39]]

2023-03-14 09:10:43 [DEBUG] Image load and resize took 0 from artifacts/food_vision_199_classes_images:v19/000000006.jpg
2023-03-14 09:10:43 [DEBUG] Read image took 0

original  300x200:
[[108, 155, 183], [119, 166, 194], [94, 141, 169]]
[[80, 127, 155], [96, 143, 171], [85, 132, 160]]
[[53, 100, 128], [68, 115, 143], [70, 114, 143]]

resized 224:
[[183, 203, 221], [183, 203, 221], [183, 203, 221]]
[[186, 206, 224], [184, 204, 222], [184, 204, 222]]
[[188, 208, 226], [186, 206, 224], [185, 205, 223]]

RGB:
[[221, 203, 183], [221, 203, 183], [221, 203, 183]]
[[224, 206, 186], [222, 204, 184], [222, 204, 184]]
[[226, 208, 188], [224, 206, 186], [223, 205, 185]]

resized 224:
[[59, 43, 37], [51, 35, 29], [49, 33, 27]]
[[59, 43, 37], [62, 46, 40], [63, 47, 41]]
[[55, 39, 33], [61, 45, 39], [63, 47, 41]]

resized 224:
[[171, 199, 234], [172, 200, 235], [172, 200, 235]]
[[172, 200, 235], [172, 200, 235], [172, 200, 234]]
[[173, 200, 234], [173, 200, 234], [173, 201, 232]]

resized 224:
[[108, 155, 183], [94, 141, 169], [103, 150, 178]]
[[80, 127, 155], [85, 132, 160], [98, 145, 173]]
[[53, 100, 128], [70, 114, 143], [75, 119, 148]]

RGB:
[[37, 43, 59], [29, 35, 51], [27, 33, 49]]
[[37, 43, 59], [40, 46, 62], [41, 47, 63]]
[[33, 39RGB, :55]
[[234, 199, 171], [235, 200, 172], [235, 200, 172]]
[[235, 200, 172], , [235, 200, 172[]39, , [45, 23461, ]200, [, 41, 172]]
[[234, 200, 173], [234, 200, 173], [232, 201, 173]]

47, 63]]

RGB:
[[183, 155, 108], [169, 141, 94], [178, 150, 103]]
[[155, 127, 80], [160, 132, 85], [173, 145, 98]]
[[128, 100, 53], [143, 114, 70], [148, 119, 75]]

0 :[221.0000, 203.0000, 183.0000, 221.0000, 203.0000, 183.0000, 221.0000, 203.0000, 183.0000, 221.0000]
0 :[0 :[37.0000, 43.0000, 059.0000,  :[29.0000, 35.0000, 51.0000, 183.0000, 27.0000, 33.0000, 155.0000, 49.0000, 108.0000, 28.0000]
234.0000, 199.0000, 171.0000, 235.0000, 200.0000, 172.0000, 235.0000, 200.0000, 172.0000, 235.0000]
169.0000, 141.0000, 94.0000, 178.0000, 150.0000, 103.0000, 146.0000]
2023-03-14 09:10:43 [DEBUG] Inner inference took 7 (test? 0)
2023-03-14 09:10:43 [DEBUG] Inner inference took 7 (test? 0)
output_tensor1 :[output_tensor3 :[0.68570.2965, , 0.9247, 0.2789, 0.0655, 1.2972, 1.6540, 0.1491, -0.0550, -0.0963, 0.2710, 0.2643, -0.0586, 1.9708, 4.6356, 3.7952, 0.2730, 0.0843, 0.7395]0.8390]
output_tensor_end3 :[0.1068, 
output_tensor_end1 :[-0.0142, -0.0835, 2.6579, 0.2692]
0.1100, 3.3414, 0.0263]
2023-03-14 09:10:43 [DEBUG] Quad array 0x7f0dd8d6e010 1 start_offset 0 
features1 :[0.2965, 0.2789, 1.2972, 0.1491]
2023-03-14 09:10:43 [DEBUG] Quad array 0x7f0dd8d6e010 3 start_offset 0 
features3 :[0.6857, 0.9247, 0.0655, 1.6540]2023-03-14 09:10:43 [DEBUG] Finished inference fine 1 (test 0)!!

2023-03-14 09:10:43 [DEBUG] Finished inference fine 3 (test 0)!!
2023-03-14 09:10:43 [DEBUG] Run inference artifacts/food_vision_199_classes_images:v19/000000008.jpg 4 batch size 1
2023-03-14 09:10:43 [DEBUG] Going to run inference batch 0 0 1 start off 4 orig off 4 len 1
2023-03-14 09:10:43 [DEBUG] Inner inference took 7 (test? 0)
output_tensor0 :[0.0374, 1.3565, 0.2863, -0.0133, 0.4412, 1.2249, 0.5738, 1.2589, 0.1180, 0.1143]
output_tensor_end0 :[-0.0039, 0.9520, 3.3100, 0.2284]
2023-03-14 09:10:43 [DEBUG] Quad array 0x7f0dd8d6e010 0 start_offset 0 
features0 :[0.0374, 1.3565, 0.2863, -0.0133]
2023-03-14 09:10:43 [DEBUG] Finished inference fine 0 (test 0)!!
2023-03-14 09:10:43 [DEBUG] Image load and resize took 0 from artifacts/food_vision_199_classes_images:v19/000000008.jpg
2023-03-14 09:10:43 [DEBUG] Read image took 0

original  300x200:
[[255, 255, 255], [255, 255, 255], [255, 255, 255]]
[[255, 255, 255], [255, 255, 255], [255, 255, 255]]
[[255, 255, 255], [255, 255, 255], [255, 255, 255]]

resized 224:
[[255, 255, 255], [255, 255, 255], [255, 255, 255]]
[[255, 255, 255], [255, 255, 255], [255, 255, 255]]
[[255, 255, 255], [255, 255, 255], [255, 255, 255]]

RGB:
[[255, 255, 255], [255, 255, 255], [255, 255, 255]]
[[255, 255, 255], [255, 255, 255], [255, 255, 255]]
[[255, 255, 255], [255, 255, 255], [255, 255, 255]]

0 :[255.0000, 255.0000, 255.0000, 255.0000, 255.0000, 255.0000, 255.0000, 255.0000, 255.0000, 255.0000]
2023-03-14 09:10:43 [DEBUG] Inner inference took 8 (test? 0)
output_tensor2 :[0.3915, 0.1209, 0.4060, 0.3373, 0.0298, -0.0035, 1.4374, 1.9704, 0.0859, 1.0056]
output_tensor_end2 :[0.0380, 2.3033, 0.6074, 1.0391]
2023-03-14 09:10:43 [DEBUG] Quad array 0x7f0dd8d6e010 2 start_offset 0 
features2 :[0.3915, 0.1209, 0.4060, 0.3373]
2023-03-14 09:10:43 [DEBUG] Finished inference fine 2 (test 0)!!
2023-03-14 09:10:43 [DEBUG] Inner inference took 6 (test? 0)
output_tensor4 :[0.0501, 0.8703, -0.1014, -0.1601, -0.1482, 0.0323, 3.1138, 2.8368, 0.5471, 0.0000]
output_tensor_end4 :[-0.0677, -0.1523, 0.8487, -0.0043]
2023-03-14 09:10:43 [DEBUG] Quad array 0x7f0dd8d6e010 4 start_offset 0 
features4 :[0.0501, 0.8703, -0.1014, -0.1601]
2023-03-14 09:10:43 [DEBUG] Finished inference fine 4 (test 0)!!
libpng warning: iCCP: known incorrect sRGB profile ] 5% Estimated: 5 Minutes 0 Features
*** buffer overflow detected ***: terminated       ] 8% Estimated: 5 Minutes 0 Features
Aborted (core dumped)

It now makes it to ~8% on ~140k images before dumping using various combinations of cores.

Going to run more tests this week + try to get it running on a completely new host to see what happens.

Will keep posting updates until it's fixed, I've used fastdup before and I'd love to keep using it.

mrdbourke commented 1 year ago

Update:

I can confirm it works on my machine with 10,000 images, this is a big advancement from before.

Input:

if __name__ == "__main__":

    fd = fastdup.create(work_dir=output_dir,
                        input_dir=images_dir)

    fd.run(num_threads=4,
        threshold=0.96, # make this arg
        compute="cpu",
        verbose=True,
        run_stats=0,
        num_images=10000)

Output:

[INFO] Finding duplicate images in: ./artifacts/food_vision_199_classes_images:v19
[INFO] Number of images: 139522
FastDup Software, (C) copyright 2022 Dr. Amir Alush and Dr. Danny Bickson.
2023-03-14 09:16:09 [INFO] Version 0.901 Release compiled on Mar  8 2023 20:18:05
2023-03-14 09:16:09 [DEBUG] Read model file UndisclosedFastdupModel.ort input layer name input output layer name global_average_pooling2d_12
2023-03-14 09:16:09 [DEBUG] out_dims[0] = -1
2023-03-14 09:16:09 [DEBUG] out_dims[1] = 576
2023-03-14 09:16:09 [DEBUG] Model dimensions are -1 224 224 3 is_nchw? 0 is b/w? 0
2023-03-14 09:16:09 [INFO] Going to loop over dir artifacts/food_vision_199_classes_images:v19
2023-03-14 09:16:10 [DEBUG] find -L artifacts/food_vision_199_classes_images:v19 -type f | egrep -i '\.bmp$|\.jpg$|\.tiff$|\.giff$|\.jpeg$|\.png$|\.tif$|\.tar$|\.tar.gz$|\.zip$|\.tgz$|\.mp4$|\.avi$|\.heif$|\.heic$'| sort > duplicates/tmp/files0.txt
2023-03-14 09:16:10 [DEBUG] Read a total of 139522 lines from duplicates/tmp/files0.txt
2023-03-14 09:16:10 [DEBUG] Total images read so far 139522
2023-03-14 09:16:10 [INFO] Found total 10000 images to run on
2023-03-14 09:16:10 [DEBUG] Going to init pool
2023-03-14 09:16:10 [DEBUG] Starting to run with 4 threads
2023-03-14 09:16:10 [DEBUG] Going to init quad array of size 10000
2023-03-14 09:16:10 [DEBUG] Going to init jobs
2023-03-14 09:16:10 [DEBUG] Run inference artifacts/food_vision_199_classes_images:v19/000000000.jpg 0 batch size 1
2023-03-14 09:16:10 [DEBUG] Going to run inference batch 0 0 1 start off 0 orig off 0 len 1
2023-03-14 09:16:10 [DEBUG] Run inference artifacts/food_vision_199_classes_images:v19/000000003.jpg 1 batch size 1
2023-03-14 09:16:10 [DEBUG] Going to run inference batch 0 0 1 start off 1 orig off 1 len 1
2023-03-14 09:16:10 [DEBUG] Run inference artifacts/food_vision_199_classes_images:v19/000000005.jpg 2 batch size 1
2023-03-14 09:16:10 [DEBUG] Going to run inference batch 0 0 1 start off 2 orig off 2 len 1
2023-03-14 09:16:10 [DEBUG] Run inference artifacts/food_vision_199_classes_images:v19/000000006.jpg 3 batch size 1
2023-03-14 09:16:10 [DEBUG] Going to run inference batch 0 0 1 start off 3 orig off 3 len 1
2023-03-14 09:16:10 [DEBUG] Image load and resize took 0 from artifacts/food_vision_199_classes_images:v19/000000003.jpg
2023-03-14 09:16:10 [DEBUG] Read image took 0

original  300x197:
[[183, 203, 221], [183, 203, 221], [183, 203, 221]]
[[186, 206, 224], [185, 205, 223], [184, 204, 222]]
[[188, 208, 226], [187, 207, 225], [186, 206, 224]]

2023-03-14 09:16:10 [DEBUG] Image load and resize took 0 from artifacts/food_vision_199_classes_images:v19/000000005.jpg
2023-03-14 09:16:10 [DEBUG] Read image took 0

original  300x200:
[[171, 199, 234], [172, 200, 235], [172, 200, 235]]
[[172, 200, 235], [172, 200, 234], [172, 200, 235]]
[[173, 200, 234], [173, 201, 232], [173, 200, 234]]

2023-03-14 09:16:10 [DEBUG] Image load and resize took 0 from artifacts/food_vision_199_classes_images:v19/000000000.jpg
2023-03-14 09:16:10 [DEBUG] Read image took 0

original  300x200:
[[59, 43, 37], [55, 39, 33], [51, 35, 29]]
[[59, 43, 37], [60, 44, 38], [62, 46, 40]]
[[55, 39, 33], [57, 41, 35], [61, 45, 39]]

2023-03-14 09:16:10 [DEBUG] Image load and resize took 0 from artifacts/food_vision_199_classes_images:v19/000000006.jpg
2023-03-14 09:16:10 [DEBUG] Read image took 0

original  300x200:
[[108, 155, 183], [119, 166, 194], [94, 141, 169]]
[[80, 127, 155], [96, 143, 171], [85, 132, 160]]
[[53, 100, 128], [68, 115, 143], [70, 114, 143]]

resized 224:
[[183, 203, 221], [183, 203, 221], [183, 203, 221]]
[[186, 206, 224], [184, 204, 222], [184, 204, 222]]

resized 224:
[[171, 199, 234], [172, 200, 235], [172, 200, 235][[188, 208, 226], [186, 206, 224], [185, 205, 223]]

]
[[172, 200, 235], [172, 200, 235], [172, 200, 234]]
[[173, 200, 234], [173, 200, 234], [173, 201, 232]]

resized 224:
[[59, 43, 37], [51, 35, 29], [49, 33, 27]]
[[59, 43, 37], [62, 46, 40], [63, 47, 41]]
[[55, 39, 33], [61, 45, 39], [63, 47, 41]]

resized 224:
[[108, 155, 183], [94, 141, 169], [103, 150, 178]]
[[80, 127, 155], [85, 132, 160], [98, 145, 173]]
[[53, 100, 128], [70, 114, 143], [75, 119, 148]]

RGB:
[[234, 199, 171], [235, 200, 172], [235, 200, 172]]
[[235, 200, 172], [235, 200, 172], [234, 200, 172]]
[[234, 200, 173], [234, 200, 173], [232, 201, 173]]

RGB:
[[221, 203, 
RGB:
[[37, 43, 59], [29, 35, 51], [18327, ]33, , 49]][
[[37, 43, 59], [40, 46, 62], [41, 47, 63]]
[[33, 39, 55], [39, 45, 61], [41, 47, 63]]

221, 203, 183], [221, 203, 183]]
[[224, 206, 186], [222, 204, 184], [222, 204, 184]]
[[226, 208, 188], [224, 206, 186], [223, 205, 185]]

RGB:
[[183, 155, 108], [169, 141, 94], [178, 150, 103]]
[[155, 127, 80], [160, 132, 85], [173, 145, 98]]
[[128, 100, 53], [143, 114, 70], [148, 119, 75]]

0 :[0 :[234.000037.0000, , 199.000043.0000, , 171.0000, 59.0000, 235.000029.0000, , 35.0000, 200.0000, 51.0000, 172.0000, 27.0000, 235.0000, 33.0000, 200.0000, 49.0000, 172.0000, 28.0000]
235.0000]
0 :[183.0000, 155.0000, 108.0000, 169.0000, 141.0000, 94.0000, 178.0000, 150.0000, 103.0000, 146.0000]
0 :[221.0000, 203.0000, 183.0000, 221.0000, 203.0000, 183.0000, 221.0000, 203.0000, 183.0000, 221.0000]
2023-03-14 09:16:10 [DEBUG] Inner inference took 5 (test? 0)
output_tensor1 :[0.2965, 0.2789, 1.2972, 0.1491, -0.0963, 0.2643, 1.9708, 3.7952, 0.0843, 0.8390]
output_tensor_end1 :[-0.0142, -0.0835, 2.6579, 0.2692]
2023-03-14 09:16:10 [DEBUG] Quad array 0x7fd6a2a06010 1 start_offset 0 
features1 :[0.2965, 0.2789, 1.2972, 0.1491]
2023-03-14 09:16:10 [DEBUG] Finished inference fine 1 (test 0)!!
2023-03-14 09:16:10 [DEBUG] Run inference artifacts/food_vision_199_classes_images:v19/000000008.jpg 4 batch size 1
2023-03-14 09:16:10 [DEBUG] Going to run inference batch 0 0 1 start off 4 orig off 4 len 1
2023-03-14 09:16:10 [DEBUG] Image load and resize took 0 from artifacts/food_vision_199_classes_images:v19/000000008.jpg
2023-03-14 09:16:10 [DEBUG] Read image took 0

original  300x200:
[[255, 255, 255], [255, 255, 255], [255, 255, 255]]
[[255, 255, 255], [255, 255, 255], [255, 255, 255]]
[[255, 255, 255], [255, 255, 255], [255, 255, 255]]

resized 224:
[[255, 255, 255], [255, 255, 255], [255, 255, 255]]
[[255, 255, 255], [255, 255, 255], [255, 255, 255]]
[[255, 255, 255], [255, 255, 255], [255, 255, 255]]

RGB:
[[255, 255, 255], [255, 255, 255], [255, 255, 255]]
[[255, 255, 255], [255, 255, 255], [255, 255, 255]]
[[255, 255, 255], [255, 255, 255], [255, 255, 255]]

0 :[255.0000, 255.0000, 255.0000, 255.0000, 255.0000, 255.0000, 255.0000, 255.0000, 255.0000, 255.0000]
2023-03-14 09:16:10 [DEBUG] Inner inference took 6 (test? 0)
output_tensor0 :[0.0374, 1.3565, 0.2863, -0.0133, 0.4412, 1.2249, 0.5738, 1.2589, 0.1180, 0.1143]
output_tensor_end0 :[-0.0039, 0.9520, 3.3100, 0.2284]
2023-03-14 09:16:10 [DEBUG] Quad array 0x7fd6a2a06010 0 start_offset 0 
features0 :[0.0374, 1.3565, 0.2863, -0.0133]
2023-03-14 09:16:10 [DEBUG] Finished inference fine 0 (test 0)!!
2023-03-14 09:16:10 [DEBUG] Inner inference took 7 (test? 0)
output_tensor2 :[0.3915, 0.1209, 0.4060, 0.3373, 0.0298, -0.0035, 1.4374, 1.9704, 0.0859, 1.0056]
output_tensor_end2 :[0.0380, 2.3033, 0.6074, 1.0391]
2023-03-14 09:16:10 [DEBUG] Inner inference took 7 (test? 0)
output_tensor3 :[2023-03-14 09:16:10 [DEBUG] Quad array 0x7fd6a2a06010 2 start_offset 0 
features2 :[0.3915, 0.1209, 0.68570.4060, , 0.3373]
0.9247, 0.0655, 1.65402023-03-14 09:16:10 [DEBUG] Finished inference fine 2 (test 0)!!
, -0.0550, 0.2710, -0.0586, 4.6356, 0.2730, 0.7395]
output_tensor_end3 :[0.1068, 0.1100, 3.3414, 0.0263]
2023-03-14 09:16:10 [DEBUG] Quad array 0x7fd6a2a06010 3 start_offset 0 
features3 :[0.6857, 0.9247, 0.0655, 1.6540]
2023-03-14 09:16:10 [DEBUG] Finished inference fine 3 (test 0)!!
2023-03-14 09:16:10 [DEBUG] Inner inference took 5 (test? 0)
output_tensor4 :[0.0501, 0.8703, -0.1014, -0.1601, -0.1482, 0.0323, 3.1138, 2.8368, 0.5471, 0.0000]
output_tensor_end4 :[-0.0677, -0.1523, 0.8487, -0.0043]
2023-03-14 09:16:10 [DEBUG] Quad array 0x7fd6a2a06010 4 start_offset 0 
features4 :[0.0501, 0.8703, -0.1014, -0.1601]
2023-03-14 09:16:10 [DEBUG] Finished inference fine 4 (test 0)!!
libpng warning: iCCP: known incorrect sRGB profile ] 64% Estimated: 0 Minutes 0 Features
2023-03-14 09:16:37 [DEBUG] Going to store results■] 100% Estimated: 0 Minutes 0 Features
Quad array 0x7fd6a2a06010 i 0 FL 576
features0 :[0.0374, 1.3565, 0.2863, -0.0133]
features-end572 :[-0.0039, 0.9520, 3.3100, 0.2284]
Quad array 0x7fd6a2a06010 i 1 FL 576
features0 :[0.2965, 0.2789, 1.2972, 0.1491]
features-end572 :[-0.0142, -0.0835, 2.6579, 0.2692]
Quad array 0x7fd6a2a06010 i 2 FL 576
features0 :[0.3915, 0.1209, 0.4060, 0.3373]
features-end572 :[0.0380, 2.3033, 0.6074, 1.0391]
Quad array 0x7fd6a2a06010 i 3 FL 576
features0 :[0.6857, 0.9247, 0.0655, 1.6540]
features-end572 :[0.1068, 0.1100, 3.3414, 0.0263]
Quad array 0x7fd6a2a06010 i 4 FL 576
features0 :[0.0501, 0.8703, -0.1014, -0.1601]
features-end572 :[-0.0677, -0.1523, 0.8487, -0.0043]
2023-03-14 09:16:37 [DEBUG] Wrote total of 10000 features , found 0 bad images, total so far 10000, filename duplicates/atrain_features.dat.csv
2023-03-14 09:16:37 [DEBUG] Done store results
2023-03-14 09:16:37 [INFO] Found total 10000 images to run on
2023-03-14 09:16:37 [DEBUG] Going to init quad array of size 1000
2023-03-14 09:16:37 [DEBUG] Going to run 10 batches with reminder 0
2023-03-14 09:16:37 [DEBUG] Going to run single thread normalization of 1000 from offet 0
2023-03-14 09:16:37 [DEBUG] Going to run single thread normalization of 1000 from offet 576000
2023-03-14 09:16:37 [DEBUG] Going to run single thread normalization of 1000 from offet 1152000
2023-03-14 09:16:37 [DEBUG] Going to run single thread normalization of 1000 from offet 5184000
2023-03-14 09:16:37 [DEBUG] Finished normalization
after normalization10 :[0.0022, 0.0795, 0.0168, -0.0008]
2023-03-14 09:16:37 [DEBUG] 15) Going to train NN model. Train sample factor 1.000000 howmany 10000
2023-03-14 09:16:37 [DEBUG] 15) Finished train() NN model
2023-03-14 09:16:39 [DEBUG] 1742) Finished add() NN model
2023-03-14 09:16:39 [DEBUG] Total data points added= 10000
2023-03-14 09:16:39 [INFO] 1753) Finished write_index() NN model
2023-03-14 09:16:39 [INFO] Stored nn model index file duplicates/nnf.index
2023-03-14 09:16:40 [DEBUG] 2343) Finished search() NN model
2023-03-14 09:16:40 [DEBUG] KNN results
    0 : 1.00000   110 : 1.00000  9132 : 0.78123 
    1 : 1.00000   265 : 1.00000  7396 : 0.79710 
    2 : 1.00000  9324 : 0.79779    16 : 0.79501 
    3 : 1.00000  8320 : 0.80875  9326 : 0.79752 
    4 : 1.00000   174 : 1.00000  6874 : 0.88474 
    5 : 1.00000  6249 : 0.73183   117 : 0.72475 
    6 : 1.00000  6224 : 1.00000  8459 : 0.75410 
    7 : 1.00000   122 : 1.00000    90 : 0.88773 
    8 : 1.00000  5422 : 1.00000   206 : 0.72317 
    9 : 1.00000  7105 : 0.76412  2582 : 0.75466 
2023-03-14 09:16:40 [DEBUG] Found total results  20000
2023-03-14 09:16:40 [DEBUG] Replacing lower threshold 0.050000 with position 19000 top_k.size() 20000 loc pos: 0.711799 last pos: 0.439984 0.950000 18999.999985
2023-03-14 09:16:40 [DEBUG] Found from=to 7631
2023-03-14 09:16:40 [DEBUG] Found from=to 8793
2023-03-14 09:16:40 [DEBUG] Found from=to 2887
2023-03-14 09:16:40 [DEBUG] Found from=to 2855
2023-03-14 09:16:40 [DEBUG] Found from=to 2782
2023-03-14 09:16:40 [DEBUG] Found from=to 2605
2023-03-14 09:16:40 [DEBUG] Found from=to 2573
2023-03-14 09:16:40 [DEBUG] Found from=to 2562
2023-03-14 09:16:40 [DEBUG] Found from=to 2502
2023-03-14 09:16:40 [DEBUG] Found from=to 7380
2023-03-14 09:16:40 [DEBUG] Found from=to 2493
2023-03-14 09:16:40 [DEBUG] Found from=to 2462
2023-03-14 09:16:40 [DEBUG] Found from=to 2453
2023-03-14 09:16:40 [DEBUG] Found from=to 119
2023-03-14 09:16:40 [DEBUG] Found from=to 133
2023-03-14 09:16:40 [DEBUG] Found from=to 118
2023-03-14 09:16:40 [DEBUG] Found from=to 265
2023-03-14 09:16:40 [DEBUG] Found from=to 137
2023-03-14 09:16:40 [DEBUG] Found from=to 139
2023-03-14 09:16:40 [DEBUG] Found from=to 182
2023-03-14 09:16:40 [DEBUG] Found from=to 178
2023-03-14 09:16:40 [DEBUG] Found from=to 174
2023-03-14 09:16:40 [DEBUG] Found from=to 8901
2023-03-14 09:16:40 [DEBUG] Found from=to 9145
2023-03-14 09:16:40 [DEBUG] Found from=to 8299
2023-03-14 09:16:40 [DEBUG] Found from=to 103
2023-03-14 09:16:40 [DEBUG] Found from=to 108
2023-03-14 09:16:40 [DEBUG] Found from=to 109
2023-03-14 09:16:40 [DEBUG] Found from=to 111
2023-03-14 09:16:40 [DEBUG] Found from=to 116
2023-03-14 09:16:40 [DEBUG] Found from=to 9553
2023-03-14 09:16:40 [DEBUG] Found from=to 6617
2023-03-14 09:16:40 [DEBUG] Found from=to 4287
2023-03-14 09:16:40 [DEBUG] Found from=to 3168
2023-03-14 09:16:40 [DEBUG] Found from=to 4244
2023-03-14 09:16:40 [DEBUG] Found from=to 4209
2023-03-14 09:16:40 [DEBUG] Found from=to 5422
2023-03-14 09:16:40 [DEBUG] Found from=to 3951
2023-03-14 09:16:40 [DEBUG] Found from=to 6435
2023-03-14 09:16:40 [DEBUG] Found from=to 3292
2023-03-14 09:16:40 [DEBUG] Found from=to 3300
2023-03-14 09:16:40 [DEBUG] Found from=to 5836
2023-03-14 09:16:40 [DEBUG] Found from=to 3489
2023-03-14 09:16:40 [DEBUG] Found from=to 3314
2023-03-14 09:16:40 [DEBUG] Found from=to 3360
2023-03-14 09:16:40 [DEBUG] Found from=to 6684
2023-03-14 09:16:40 [DEBUG] Found from=to 2912
2023-03-14 09:16:40 [DEBUG] Found from=to 4389
2023-03-14 09:16:40 [DEBUG] Found from=to 4319
2023-03-14 09:16:40 [DEBUG] Found from=to 7711
2023-03-14 09:16:40 [DEBUG] Found from=to 2744
2023-03-14 09:16:40 [DEBUG] Found from=to 218
2023-03-14 09:16:40 [DEBUG] Found from=to 8445
2023-03-14 09:16:40 [DEBUG] Found from=to 127
2023-03-14 09:16:40 [DEBUG] Found from=to 179
2023-03-14 09:16:40 [DEBUG] Found from=to 2526
2023-03-14 09:16:40 [DEBUG] Found from=to 3134
2023-03-14 09:16:40 [DEBUG] Found from=to 181
2023-03-14 09:16:40 [DEBUG] Found from=to 193
2023-03-14 09:16:40 [DEBUG] Found from=to 237
2023-03-14 09:16:40 [DEBUG] Found from=to 212
2023-03-14 09:16:40 [DEBUG] Found from=to 3857
2023-03-14 09:16:40 [DEBUG] Found from=to 3994
2023-03-14 09:16:40 [DEBUG] Found from=to 292
2023-03-14 09:16:40 [DEBUG] Found from=to 125
2023-03-14 09:16:40 [DEBUG] Found from=to 6517
2023-03-14 09:16:40 [DEBUG] Found from=to 122
2023-03-14 09:16:40 [DEBUG] Found from=to 129
2023-03-14 09:16:40 [DEBUG] Found from=to 6224
2023-03-14 09:16:40 [DEBUG] Found from=to 308
2023-03-14 09:16:40 [DEBUG] Found from=to 199
2023-03-14 09:16:40 [DEBUG] Found from=to 2906
2023-03-14 09:16:40 [DEBUG] Found from=to 4188
2023-03-14 09:16:40 [DEBUG] Found from=to 2727
2023-03-14 09:16:40 [DEBUG] Found from=to 2514
2023-03-14 09:16:40 [DEBUG] Found from=to 2530
2023-03-14 09:16:40 [DEBUG] Found from=to 7205
2023-03-14 09:16:40 [DEBUG] Found from=to 6966
2023-03-14 09:16:40 [DEBUG] Found from=to 198
2023-03-14 09:16:40 [DEBUG] Found from=to 110
2023-03-14 09:16:40 [DEBUG] Found from=to 3813
2023-03-14 09:16:40 [DEBUG] Found from=to 6389
2023-03-14 09:16:40 [DEBUG] Found from=to 4083
2023-03-14 09:16:40 [DEBUG] Found from=to 2456
2023-03-14 09:16:40 [DEBUG] Found from=to 2465
2023-03-14 09:16:40 [DEBUG] Found from=to 7549
2023-03-14 09:16:40 [DEBUG] Found from=to 3888
2023-03-14 09:16:40 [DEBUG] Found from=to 214
2023-03-14 09:16:40 [DEBUG] Found from=to 143
2023-03-14 09:16:40 [DEBUG] Found from=to 3228
2023-03-14 09:16:40 [DEBUG] Found from=to 167
2023-03-14 09:16:40 [DEBUG] Found from=to 107
2023-03-14 09:16:40 [DEBUG] Found from=to 138
2023-03-14 09:16:40 [DEBUG] Found from=to 4925
2023-03-14 09:16:40 [DEBUG] Found from=to 4292
2023-03-14 09:16:40 [DEBUG] Found from=to 283
2023-03-14 09:16:40 [DEBUG] Found from=to 132
2023-03-14 09:16:40 [DEBUG] Found from=to 2952
2023-03-14 09:16:40 [DEBUG] Found from=to 2967
2023-03-14 09:16:40 [DEBUG] Found from=to 175
2023-03-14 09:16:40 [DEBUG] Found from=to 9715
2023-03-14 09:16:40 [DEBUG] Found from=to 302
2023-03-14 09:16:40 [DEBUG] Found from=to 4842
2023-03-14 09:16:40 [DEBUG] Found from=to 2858
2023-03-14 09:16:40 [DEBUG] Found from=to 9738
2023-03-14 09:16:40 [DEBUG] Found from=to 293
2023-03-14 09:16:40 [DEBUG] Found from=to 2540
2023-03-14 09:16:40 [DEBUG] Found from=to 197
2023-03-14 09:16:40 [INFO] Total time took 29412 ms
2023-03-14 09:16:40 [INFO] Found a total of 182 fully identical images (d>0.990), which are 0.61 %
2023-03-14 09:16:40 [INFO] Found a total of 90 nearly identical images(d>0.980), which are 0.30 %
2023-03-14 09:16:40 [INFO] Found a total of 413 above threshold images (d>0.960), which are 1.38 %
2023-03-14 09:16:40 [INFO] Found a total of 1000 outlier images         (d<0.050), which are 3.33 %
2023-03-14 09:16:40 [INFO] Min distance found 0.440 max distance 1.000
2023-03-14 09:16:40 [INFO] 

Example similar files
from,to,distance
76,133,1.000000
64,118,1.000000
1956,4389,1.000000
1923,4209,1.000000
2023-03-14 09:16:40 [INFO] Running connected components for ccthreshold 0.960000 
2023-03-14 09:16:40 [DEBUG] 20000 After removing edges removed 19479 edges h 0
.02023-03-14 09:16:40 [DEBUG] Last component id was 9765
2023-03-14 09:16:40 [DEBUG] Total component stats size is 9765 last component was 9765
2023-03-14 09:16:40 [DEBUG] Going to store components to file duplicates/connected_components.csv
Traceback (most recent call last):
  File "/home/daniel/code/pytorch/env/lib/python3.8/site-packages/fastdup/sentry.py", line 114, in inner_function
    ret = func(*args, **kwargs)
  File "/home/daniel/code/pytorch/env/lib/python3.8/site-packages/fastdup/fastdup_controller.py", line 223, in img_stats
    assert df is not None and not df.empty, f'No stats file found in {self._work_dir}'
AssertionError: No stats file found in duplicates

 ########################################################################################

Dataset Analysis Summary: 

    Dataset contains 10000 images
    Valid images are 100.00% (10,000) of the data, invalid are 0.00% (0) of the data
    Similarity:  2.18% (218) belong to 6 similarity clusters (components).
    97.82% (9,782) images do not belong to any similarity cluster.
    Largest cluster has 10 (0.10%) images.
    For a detailed analysis, use `.connected_components()`
(similarity threshold used is 0.96, connected component threshold used is 0.96).

    Outliers: 6.47% (647) of images are possible outliers, and fall in the bottom 5.00% of similarity values.
    For a detailed list of outliers, use `.outliers()`.
mrdbourke commented 1 year ago

Update:

Ok, it fails on 25,000 images, this is definitely my system bugging out.

[INFO] Finding duplicate images in: ./artifacts/food_vision_199_classes_images:v19
[INFO] Number of images: 139522
Traceback (most recent call last):
  File "/home/daniel/code/pytorch/env/lib/python3.8/site-packages/fastdup/sentry.py", line 114, in inner_function
    ret = func(*args, **kwargs)
  File "/home/daniel/code/pytorch/env/lib/python3.8/site-packages/fastdup/fastdup_controller.py", line 290, in run
    self._init_run(input_dir, annotations, subset, embeddings, data_type, overwrite, fastdup_kwargs)
  File "/home/daniel/code/pytorch/env/lib/python3.8/site-packages/fastdup/fastdup_controller.py", line 105, in _init_run
    self._verify_fastdup_run_args(input_dir, self._work_dir, df_annot, subset, self._dtype, embeddings)
  File "/home/daniel/code/pytorch/env/lib/python3.8/site-packages/fastdup/fastdup_controller.py", line 841, in _verify_fastdup_run_args
    assert not self._fastdup_applied, \
AssertionError: there is already an active fastup run on the working dir, change work_dir or run with overwrite=True
Traceback (most recent call last):
  File "find_duplicates.py", line 76, in <module>
    fd.run(num_threads=4,
  File "/home/daniel/code/pytorch/env/lib/python3.8/site-packages/fastdup/engine.py", line 156, in run
    super().run(annotations=annotations, input_dir=input_dir, subset=subset, data_type=data_type,
  File "/home/daniel/code/pytorch/env/lib/python3.8/site-packages/fastdup/sentry.py", line 120, in inner_function
    raise ex
  File "/home/daniel/code/pytorch/env/lib/python3.8/site-packages/fastdup/sentry.py", line 114, in inner_function
    ret = func(*args, **kwargs)
  File "/home/daniel/code/pytorch/env/lib/python3.8/site-packages/fastdup/fastdup_controller.py", line 290, in run
    self._init_run(input_dir, annotations, subset, embeddings, data_type, overwrite, fastdup_kwargs)
  File "/home/daniel/code/pytorch/env/lib/python3.8/site-packages/fastdup/fastdup_controller.py", line 105, in _init_run
    self._verify_fastdup_run_args(input_dir, self._work_dir, df_annot, subset, self._dtype, embeddings)
  File "/home/daniel/code/pytorch/env/lib/python3.8/site-packages/fastdup/fastdup_controller.py", line 841, in _verify_fastdup_run_args
    assert not self._fastdup_applied, \
AssertionError: there is already an active fastup run on the working dir, change work_dir or run with overwrite=True
Fatal Python error: Segmentation fault

Current thread 0x00007fbc3f902700 (most recent call first):
  File "/home/daniel/code/pytorch/env/lib/python3.8/site-packages/eventlet/hubs/epolls.py", line 31 in do_poll
  File "/home/daniel/code/pytorch/env/lib/python3.8/site-packages/eventlet/hubs/poll.py", line 80 in wait
  File "/home/daniel/code/pytorch/env/lib/python3.8/site-packages/eventlet/hubs/hub.py", line 365 in run

Thread 0x00007fbcaf319340 (most recent call first):
<no Python frame>
Segmentation fault (core dumped)
(/home/daniel/code/pytorch/env) daniel@daniel-Z490-UD:~/code/nutrify/foodvision$ python find_duplicates.py 
[INFO] Finding duplicate images in: ./artifacts/food_vision_199_classes_images:v19
[INFO] Number of images: 139522
FastDup Software, (C) copyright 2022 Dr. Amir Alush and Dr. Danny Bickson.
2023-03-14 09:20:42 [INFO] Version 0.901 Release compiled on Mar  8 2023 20:18:05
2023-03-14 09:20:42 [DEBUG] Read model file UndisclosedFastdupModel.ort input layer name input output layer name global_average_pooling2d_12
2023-03-14 09:20:42 [DEBUG] out_dims[0] = -1
2023-03-14 09:20:42 [DEBUG] out_dims[1] = 576
2023-03-14 09:20:42 [DEBUG] Model dimensions are -1 224 224 3 is_nchw? 0 is b/w? 0
2023-03-14 09:20:42 [INFO] Going to loop over dir artifacts/food_vision_199_classes_images:v19
2023-03-14 09:20:43 [DEBUG] find -L artifacts/food_vision_199_classes_images:v19 -type f | egrep -i '\.bmp$|\.jpg$|\.tiff$|\.giff$|\.jpeg$|\.png$|\.tif$|\.tar$|\.tar.gz$|\.zip$|\.tgz$|\.mp4$|\.avi$|\.heif$|\.heic$'| sort > duplicates/tmp/files0.txt
2023-03-14 09:20:43 [DEBUG] Read a total of 139522 lines from duplicates/tmp/files0.txt
2023-03-14 09:20:43 [DEBUG] Total images read so far 139522
2023-03-14 09:20:43 [INFO] Found total 25000 images to run on
2023-03-14 09:20:43 [DEBUG] Going to init pool
2023-03-14 09:20:43 [DEBUG] Starting to run with 4 threads
2023-03-14 09:20:43 [DEBUG] Going to init quad array of size 25000
2023-03-14 09:20:43 [DEBUG] Going to init jobs
2023-03-14 09:20:43 [DEBUG] Run inference artifacts/food_vision_199_classes_images:v19/000000000.jpg 0 batch size 1
2023-03-14 09:20:43 [DEBUG] Going to run inference batch 0 0 1 start off 0 orig off 0 len 1
2023-03-14 09:20:43 [DEBUG] Run inference artifacts/food_vision_199_classes_images:v19/000000003.jpg 1 batch size 1
2023-03-14 09:20:43 [DEBUG] Going to run inference batch 0 0 1 start off 1 orig off 1 len 1
2023-03-14 09:20:43 [DEBUG] Run inference artifacts/food_vision_199_classes_images:v19/000000005.jpg 2 batch size 1
2023-03-14 09:20:43 [DEBUG] Going to run inference batch 0 0 1 start off 2 orig off 2 len 1
2023-03-14 09:20:43 [DEBUG] Run inference artifacts/food_vision_199_classes_images:v19/000000006.jpg 3 batch size 1
2023-03-14 09:20:43 [DEBUG] Going to run inference batch 0 0 1 start off 3 orig off 3 len 1
2023-03-14 09:20:43 [DEBUG] Image load and resize took 0 from artifacts/food_vision_199_classes_images:v19/000000003.jpg
2023-03-14 09:20:43 [DEBUG] Read image took 0

original  300x197:
[[183, 203, 221], [183, 203, 221], [183, 203, 221]]
[[186, 206, 224], [185, 205, 223], [184, 204, 222]]
[[188, 208, 226], [187, 207, 225], [186, 206, 224]]

2023-03-14 09:20:43 [DEBUG] Image load and resize took 0 from artifacts/food_vision_199_classes_images:v19/000000005.jpg
2023-03-14 09:20:43 [DEBUG] Read image took 0

original  300x200:
[[171, 199, 234], [172, 200, 235], [172, 200, 235]]
[[172, 200, 235], [172, 200, 234], [172, 200, 235]]
[[173, 200, 234], [173, 201, 232], [173, 200, 234]]

2023-03-14 09:20:43 [DEBUG] Image load and resize took 0 from artifacts/food_vision_199_classes_images:v19/000000000.jpg
2023-03-14 09:20:43 [DEBUG] Read image took 0

original  300x200:
[[59, 43, 37], [55, 39, 33], [51, 35, 29]]
[[59, 43, 37], [60, 44, 38], [62, 46, 40]]
[[55, 39, 33], [57, 41, 35], [61, 45, 39]]

2023-03-14 09:20:43 [DEBUG] Image load and resize took 0 from artifacts/food_vision_199_classes_images:v19/000000006.jpg
2023-03-14 09:20:43 [DEBUG] Read image took 0

original  300x200:
[[108, 155, 183], [119, 166, 194], [94, 141, 169]]
[[80, 127, 155], [96, 143, 171], [85, 132, 160]]
[[53, 100, 128], [68, 115, 143], [70, 114, 143]]

resized 224:
[[183, 203, 221], [183, 203, 221], [183, 203, 221]]
[[186, 206, 224], [184, 204, 222], [184, 204, 222]]
[[188, 208, 226], [186, 206, 224], [185, 205, 223]]

resized 224:
[[171, 199, 234], [172, 200, 235], [172, 200, 235]]
[[172, 200, 235], [172, 200, 235], [172, 200, 234]]
[[173, 200, 234], [173, 200, 234], [173, 201, 232]]

resized 224:
[[59, 43, 37], [51, 35, 29], [49, 33, 27]]
[[59, 43, 37], [62, 46, 40], [63, 47, 41]]
[[55, 39, 33], [61, 45, 39], [63, 47, 41]]

RGB:
[[221, 203, 183], [221, 203, 183], [221RGB:

resized 224:
[[108, 155, 183], [94, 141, 169], [[[234, 199, 171], [235, 200, 172], [235, 200, 172]]
[[235, 200, 172], [235, 200, 172], [234, 200, 172]]
[[234, 200, 173], [234, 200, 173], [232, 201, 173]]

, 203, 183]]
[[224, 206, 186], [222, 204, 184], [222, 204, 184]]
[[226, 208, 188], [224, 206, 186], [223, 205, 185]]

RGB:
[[37, 43, 59], [29, 35, 51], [27, 33, 49]]
[[37, 43, 59], [40, 46, 62], [41, 47, 63]]
[[33, 39, 55], [39, 45, 61], [41, 47, 63]]

103, 150, 178]]
[[80, 127, 155], [85, 132, 160], [98, 145, 173]]
[[53, 100, 128], [70, 114, 143], [75, 119, 148]]

RGB:
[[183, 155, 108], [169, 141, 94], [178, 150, 103]]
[[155, 127, 80], [160, 132, 85], [173, 145, 98]]
[[128, 100, 53], [143, 114, 70], [148, 119, 75]]

0 :[234.0000, 199.0000, 171.0000, 235.0000, 200.0000, 172.0000, 235.0000, 200.0000, 172.0000, 235.0000]
0 :[0 :[37.0000, 43.0000, 221.0000, 59.0000, 203.0000, 29.0000, 183.0000, 35.0000, 221.0000, 51.0000, 203.0000, 27.0000, 183.0000, 33.0000, 221.0000, 49.0000, 203.0000, 28.0000]
183.0000, 221.0000]
0 :[183.0000, 155.0000, 108.0000, 169.0000, 141.0000, 94.0000, 178.0000, 150.0000, 103.0000, 146.0000]
2023-03-14 09:20:43 [DEBUG] Inner inference took 5 (test? 0)
output_tensor3 :[0.6857, 0.9247, 0.0655, 1.6540, -0.0550, 0.2710, -0.0586, 4.6356, 0.2730, 0.7395]
output_tensor_end3 :[0.1068, 0.1100, 3.3414, 0.0263]
2023-03-14 09:20:43 [DEBUG] Quad array 0x7f5328911010 3 start_offset 0 
features3 :[0.6857, 0.9247, 0.0655, 1.6540]
2023-03-14 09:20:43 [DEBUG] Finished inference fine 3 (test 0)!!
2023-03-14 09:20:43 [DEBUG] Run inference artifacts/food_vision_199_classes_images:v19/000000008.jpg 4 batch size 1
2023-03-14 09:20:43 [DEBUG] Going to run inference batch 0 0 1 start off 4 orig off 4 len 1
2023-03-14 09:20:43 [DEBUG] Image load and resize took 0 from artifacts/food_vision_199_classes_images:v19/000000008.jpg
2023-03-14 09:20:43 [DEBUG] Read image took 0

original  300x200:
[[255, 255, 255], [255, 255, 255], [255, 255, 255]]
[[255, 255, 255], [255, 255, 255], [255, 255, 255]]
[[255, 255, 255], [255, 255, 255], [255, 255, 255]]

resized 224:
[[255, 255, 255], [255, 255, 255], [255, 255, 255]]
[[255, 255, 255], [255, 255, 255], [255, 255, 255]]
[[255, 255, 255], [255, 255, 255], [255, 255, 255]]

RGB:
[[255, 255, 255], [255, 255, 255], [255, 255, 255]]
[[255, 255, 255], [255, 255, 255], [255, 255, 255]]
[[255, 255, 255], [255, 255, 255], [255, 255, 255]]

0 :[255.0000, 255.0000, 255.0000, 255.0000, 255.0000, 255.0000, 255.0000, 255.0000, 255.0000, 255.0000]
2023-03-14 09:20:43 [DEBUG] Inner inference took 6 (test? 0)
output_tensor1 :[0.2965, 0.2789, 1.2972, 0.1491, 2023-03-14 09:20:43 [DEBUG] Inner inference took 6 (test? 0)
output_tensor0 :[-0.0963, 0.26430.0374, , 1.9708, 1.35653.7952, , 0.28630.0843, , 0.8390]
output_tensor_end1 :[-0.0142, -0.0835, 2.6579, 0.2692]
-0.0133, 0.4412, 1.2249, 0.5738, 1.2589, 0.1180, 0.1143]
output_tensor_end0 :[-0.0039, 0.9520, 3.3100, 2023-03-14 09:20:43 [DEBUG] Quad array 0x7f5328911010 1 start_offset 0 
features1 :[0.2965, 0.22840.2789, ]
1.2972, 0.1491]
2023-03-14 09:20:43 [DEBUG] Quad array 0x7f5328911010 0 start_offset 0 
features0 :[0.0374, 1.3565, 0.28632023-03-14 09:20:43 [DEBUG] Finished inference fine 1 (test 0)!!
, -0.0133]
2023-03-14 09:20:43 [DEBUG] Finished inference fine 0 (test 0)!!
2023-03-14 09:20:43 [DEBUG] Inner inference took 10 (test? 0)
output_tensor2 :[0.3915, 0.1209, 0.4060, 0.3373, 0.0298, -0.0035, 1.4374, 1.9704, 0.0859, 1.0056]
output_tensor_end2 :[0.0380, 2.3033, 0.6074, 1.0391]
2023-03-14 09:20:43 [DEBUG] Quad array 0x7f5328911010 2 start_offset 0 
features2 :[0.3915, 0.1209, 0.4060, 0.3373]
2023-03-14 09:20:43 [DEBUG] Finished inference fine 2 (test 0)!!
2023-03-14 09:20:43 [DEBUG] Inner inference took 5 (test? 0)
output_tensor4 :[0.0501, 0.8703, -0.1014, -0.1601, -0.1482, 0.0323, 3.1138, 2.8368, 0.5471, 0.0000]
output_tensor_end4 :[-0.0677, -0.1523, 0.8487, -0.0043]
2023-03-14 09:20:43 [DEBUG] Quad array 0x7f5328911010 4 start_offset 0 
features4 :[0.0501, 0.8703, -0.1014, -0.1601]
2023-03-14 09:20:43 [DEBUG] Finished inference fine 4 (test 0)!!
libpng warning: iCCP: known incorrect sRGB profile ] 26% Estimated: 0 Minutes 0 Features
*** buffer overflow detected ***: terminated       ] 44% Estimated: 0 Minutes 0 Features
Aborted (core dumped)

Will run a few more tests before trying on a new machine (sunken cost has sunk in hahaha)

dbickson commented 1 year ago

Hi, does it always fail on the same place:

*** buffer overflow detected ***: terminated       ] 44% Estimated: 0 Minutes 0 Features
Aborted (core dumped)

if so, I suggest running with num_threads=1,verbose=1 and try to find the bad image. You can use num_images=XX to limit the number of images until you hit the bad image.

tiancaipipi110 commented 1 year ago

I'm having the same problem. Running on Win10 WSL.v2

2023-03-15 07:24:02 [INFO] Found total 4 images to run on
*** buffer overflow detected ***: python3.8 terminated
*** buffer overflow detected ***: python3.8 terminated
[■■■■■■■■■■■■■                                     ] 25% Estimated: 0 Minutes 0 Fea[■■■■■■■■■■■■■■■■■■■■■■■■■■                        ] 50% Estimated: 0 Minutes 0 FeaExceptionHandler::GenerateDump waitpid failed:No child processes
Aborted (core dumped)

Here're the images.

dbickson commented 1 year ago

Hi @tiancaipipi110 , we have now a native windows version, no need to install WSL. Can you try and install using python3.9 on native Windows 10 and let us know if this works? In addition, please rerun on WSL using verbose=1,num_threads=1 and send us the output.

tiancaipipi110 commented 1 year ago

@dbickson The native installation on python 3.9 by pip install fastdup was successful but it fails to import the module- 'The system cannot find the path specified." Setting verbose=1,num_threads=1 on WSL still gets the same error.

dbickson commented 1 year ago

Hi, apologies, "the system cannot find the path" is a warning and not an error, after it is printed you can run as usual. Can you please try again?

tiancaipipi110 commented 1 year ago

@dbickson Here's the output. Would be nice if we can catch the error and proceed with the following directories instead of killing all.

2023-03-15 10:28:22 [INFO] Version 0.904 Release compiled on Mar 15 2023 04:11:34
2023-03-15 10:28:23 [DEBUG] Read model file UndisclosedFastdupModel.ort input layer name input output layer name global_average_pooling2d_12
2023-03-15 10:28:23 [DEBUG] out_dims[0] = -1
2023-03-15 10:28:23 [DEBUG] out_dims[1] = 576
2023-03-15 10:28:23 [DEBUG] Model dimensions are -1 224 224 3 is_nchw? 0 is b/w? 0
2023-03-15 10:28:23 [INFO] Going to loop over dir M:\RENDER\images
2023-03-15 10:28:23 [DEBUG] Total images read so far 4
2023-03-15 10:28:23 [INFO] Found total 4 images to run on
2023-03-15 10:28:23 [DEBUG] Going to init pool
2023-03-15 10:28:23 [DEBUG] Starting to run with 1 threads
2023-03-15 10:28:23 [DEBUG] Going to init quad array of size 4
2023-03-15 10:28:23 [DEBUG] Going to init jobs
2023-03-15 10:28:23 [DEBUG] Run inference M 0 batch size 1
2023-03-15 10:28:23 [DEBUG] Going to run inference batch 0 0 1 start off 0 orig off 0 len 1
dbickson commented 1 year ago

Hi @mrdbourke please reopen if this is still an issue for you.