visual-layer / fastdup

fastdup is a powerful free tool designed to rapidly extract valuable insights from your image & video datasets. Assisting you to increase your dataset images & labels quality and reduce your data operations costs at an unparalleled scale.
Other
1.57k stars 76 forks source link

[Bug]: Failed to execute fd.run() #340

Closed lycika-5mzw closed 2 months ago

lycika-5mzw commented 2 months ago

What happened?

Code:

def start(progressbar: Progress, begin: int = 0, limit: int = 0, batch_size: int = 0, deduplicate: bool = False):
    split_image(progressbar, begin, limit)
    if deduplicate:
        fd = fastdup.create(work_dir=FASTDUP_WORK_PATH, input_dir=CANDIDATE_SPLIT_PATH)
        fd.run(overwrite=True)
        fd.explore()

Error Message:

fastdup.fastdup_runner.utilities.ExplorationError: "Error: Insufficient number of valid images in dataset (0). Minimum required images: 10". Please note that only the following image 
formats are supported: ['.png', '.jpg', '.jpeg', '.gif', '.giff', '.tif', '.tiff', '.heic', '.heif', '.bmp', '.webp', '.jfif']

I got hundres of picture named by <HASH>_<TIMESTAMP>_<XAXIS>,<YAXIS>.jpg format. For example: "0a847185848d6e7fc6e967b96a5af457_1720439139294554000_0,0.jpg". They're all 100*87 in size. But fastdup seems to failed read these images

BTW

I can not provide more information since the source code is confused by pyarmor

What did you expect to see?

Images Loaded, WebUI Started

What version of fastdup were you runnning on?

2.5

What version of Python were you running on?

Other

Operating System

macOS 14.3.1 M1

Reproduction steps

No response

Relevant log output

No response

Attach a screenshot [Optional]

No response

Contact Details [Optional]

No response

dbickson commented 2 months ago

HI @lycika-5mzw, can you specify what is in CANDIDATE_SPLIT_PATH ? Please send us the output of find <candidate_split_path_folder> -name '*.jpg'. We have checked your filenames and it seems to work fine. Please run() with verbose=1 and send us the full output.

lycika-5mzw commented 2 months ago

@dbickson Thanks for reply, just been busy. And sorry for not render the verbose=1 output as a code block, I don't know how to render a code block within the <detail> tag

find command output

find ./assets/candidates_split/ -name '*.jpg'
./assets/candidates_split//0b9e8d01c653efebbb16dcf8dbd7e0e7_1720439636011954000_2,2.jpg
./assets/candidates_split//07f17c0fa416f38cf9c517cbb3ebc8d3_1720439078829525000_0,1.jpg
./assets/candidates_split//3cc7bba743a00bc0fcf41ebd7b8dc8b1_1720439161270461000_2,0.jpg
./assets/candidates_split//2627c50f79f43a1321bf8488acfb7390_1720439142586106000_0,0.jpg
./assets/candidates_split//9df02e74ea16c790510eac082cbdf7d9_1720440019576070000_1,0.jpg
./assets/candidates_split//7cf19dfc5e9c65db6c8ad153ecd1a6e4_1720439649441145000_2,2.jpg
./assets/candidates_split//cb8884398be34fe4734c5106aa93ff01_1720439819882526000_1,2.jpg
./assets/candidates_split//2483118f645a08d7c744dab796b79fad_1720439987313116000_1,2.jpg
./assets/candidates_split//2ba7dc2f6ee12681a674e57ed23ce5ce_1720439847878324000_1,2.jpg
./assets/candidates_split//b80feac615da6b3ab4e4e0589d6fffd2_1720439999224942000_1,0.jpg

fd.run(verbose=True) log

fastdup By Visual Layer, Inc. 2024. All rights reserved. A fastdup dataset object was created! Input directory is set to "assets/candidates_split" Work directory is set to "fastdup" The next steps are: 1. Analyze your dataset with the .run() function of the dataset object 2. Interactively explore your data on your local machine with the .explore() function of the dataset object For more information, use help(fastdup) or check our documentation https://docs.visual-layer.com/docs/getting-started-with-fastdup. fastdup By Visual Layer, Inc. 2024. All rights reserved. Using crashpad handler: /PATH/TO/PROJECT/.venv_3.11/lib/python3.11/site-packages/fastdup/lib/crashpad_handler 2024-07-11 00:16:39 [DEBUG] Read model file UndisclosedFastdupModel.ort input layer name input output layer name global_average_pooling2d_12 2024-07-11 00:16:39 [DEBUG] out_dims[0] = -1 2024-07-11 00:16:39 [DEBUG] out_dims[1] = 576 2024-07-11 00:16:39 [DEBUG] Model dimensions are -1 224 224 3 is_nchw? 0 is b/w? 0 2024-07-11 00:16:39 [INFO] Going to loop over dir assets/candidates_split 2024-07-11 00:16:39 [DEBUG] find -L "assets/candidates_split" -type f | egrep -i '\.bmp$|\.jpg$|\.jp2$|\.tiff$|\.giff$|\.jpeg$|\.png$|\.tif$|\.tar$|\.tar.gz$|\.zip$|\.tgz$|\.mp4$|\.avi$|\.m4a$|\.m4v$|\.mov$|\.dav$|\.heif$|\.heic$|\.webp$|\.jfif$|\.mkv$|\.flv$|\.wmv$|\.webm$|\.mpg$|\.mpeg$|\.3gp$'| sort > fastdup/tmp/files0.txt 2024-07-11 00:16:39 [DEBUG] Read a total of 4500 lines from fastdup/tmp/files0.txt 2024-07-11 00:16:39 [DEBUG] Total images read so far 4500 2024-07-11 00:16:39 [INFO] Found total 4500 images to run on, 4500 train, 0 test, name list 4500, counter 4500 2024-07-11 00:16:39 [DEBUG] Going to init pool 2024-07-11 00:16:39 [DEBUG] Starting to run with 8 threads 2024-07-11 00:16:39 [DEBUG] Going to init quad array of size 4500 2024-07-11 00:16:39 [DEBUG] Going to init jobs 2024-07-11 00:16:39 [DEBUG] Run inference assets/candidates_split/0259c3ab68011a6a210e3278cc3e6f3d_1720438938240382000_0,0.jpg 0 batch size 1 2024-07-11 00:16:39 [DEBUG] Run inference assets/candidates_split/0259c3ab68011a6a210e3278cc3e6f3d_1720438938240382000_1,0.jpg 3 batch size 1 2024-07-11 00:16:39 [DEBUG] Going to run inference batch 0 0 1 start off 0 orig off 0 len 1 2024-07-11 00:16:39 [DEBUG] Going to run inference batch 0 0 1 start off 3 orig off 3 len 1 2024-07-11 00:16:39 [DEBUG] Run inference assets/candidates_split/0259c3ab68011a6a210e3278cc3e6f3d_1720438938240382000_1,1.jpg 4 batch size 1 2024-07-11 00:16:39 [DEBUG] Going to run inference batch 0 0 1 start off 4 orig off 4 len 1 2024-07-11 00:16:39 [DEBUG] Run inference assets/candidates_split/0259c3ab68011a6a210e3278cc3e6f3d_1720438938240382000_0,2.jpg 2 batch size 1 2024-07-11 00:16:39 [DEBUG] Going to run inference batch 0 0 1 start off 2 orig off 2 len 1 2024-07-11 00:16:39 [DEBUG] Run inference assets/candidates_split/0259c3ab68011a6a210e3278cc3e6f3d_1720438938240382000_0,1.jpg 1 batch size 1 2024-07-11 00:16:39 [DEBUG] Going to run inference batch 0 0 1 start off 1 orig off 1 len 1 2024-07-11 00:16:39 [DEBUG] Image load and resize took 2 from assets/candidates_split/0259c3ab68011a6a210e3278cc3e6f3d_1720438938240382000_0,2.jpg 2024-07-11 00:16:39 [DEBUG] Read image took 2 2024-07-11 00:16:39 [DEBUG] Image load and resize took 2 from assets/candidates_split/0259c3ab68011a6a210e3278cc3e6f3d_1720438938240382000_0,0.jpg 2024-07-11 00:16:39 [DEBUG] Image load and resize took 2 from assets/candidates_split/0259c3ab68011a6a210e3278cc3e6f3d_1720438938240382000_1,0.jpg 2024-07-11 00:16:39 [DEBUG] Read image took 2 2024-07-11 00:16:39 [DEBUG] Image load and resize took 2 from assets/candidates_split/0259c3ab68011a6a210e3278cc3e6f3d_1720438938240382000_0,1.jpg 2024-07-11 00:16:39 [DEBUG] Read image took 2 2024-07-11 00:16:39 [DEBUG] Image load and resize took 2 from assets/candidates_split/0259c3ab68011a6a210e3278cc3e6f3d_1720438938240382000_1,1.jpg 2024-07-11 00:16:39 [DEBUG] Read image took 2 original 100x87: [[ 2024-07-11 00:16:39 [DEBUG] Read image took 2 255original 100x87: [[255, original 100x87: [[255, 255, 255], [255, 255, 255], [255, 255, 255, original 100x87:255, 255 ] 255, 255]]] [[255, 255, 255], , [[255255, 255, 255], 255, 255], [255, 255, 255]] , [, [255, 255, [[255, 255, 255255], [255, 255, 255]] [[255, 255, 255], [255, 255, 255], [255, 255, 255]] [[255, [[255, 255255], [255, 255, 255], [255, 255, 251, 255], 255, [255, 255, original 100x87, : [[255, 255, 255], [255, 255, 255], [255, 255, 255]] [[255]]], [255, 255, 255], [255, 255, 255]] 255, 255 255, 255]] [[255, 255, 255], [255, 255, 255], [255, 255255, , 255], [255, 255, 255]] [[253, 255, 255], [253, 255, 255], [253, 255, 255]] ], [251, 255, 255], [253, 255, 255]] [[251, 255, 255], [251, 255, 255], [253, 255, 255]] [[251, 255, 255], [251, 255, 255], [253, 255, 255]] 255]] [[255, 255, 255], [255, 255, 255], [255, 255, 255]] 2024-07-11 00:16:39 [DEBUG] Computed stats 16807.019531 151.179581 105.936523 2024-07-11 00:16:39 [DEBUG] Computed stats 21843.732422 197.045013 91.041908 2024-07-11 00:16:39 [DEBUG] Computed stats 7786.753418 240.129578 42.584698 2024-07-11 00:16:39 [DEBUG] Computed stats 12685.631836 129.354675 98.006622 2024-07-11 00:16:39 [DEBUG] Computed stats 19594.638672 177.504135 94.880737 resized 224: [[255, 255, 255], [255, 255, 255], [255, 255, 255]] [[255, 255, 255], [255, 255, 255], [255, 255, 255]] [[255, 255, 255], [255, 255, 255], [255, 255, 255]] resized 224: [[255, 255, 255], [255, 255, 255], [255, 255, 255]] [[255, 255, 255], [255, 255, 255], [255, 255, 255]] [[255, 255, 255], [255, 255, 255], [255, 255, 255]] resized 224: [[255, 255, 255], [255, 255, 255], [255, 255, 255]] [[255, 255, 255], [255, 255, 255], [255, 255, 255]] [[255, 255, 255], [255, 255, 255], [255, 255, 255]] resized 224: [[251, 255, 255], [251, 255, 255], [251, 255, 255]] [[251, 255, 255], [251, 255, 255], [251, 255, 255]] [[251, 255, 255], [251, 255, 255], [251, 255, 255]] resized 224: [[255, 255, 255], [255, 255, 255], [255, 255, 255]] [[255, 255, 255], [255, 255, 255], [255, 255, 255]] [[255, 255, 255], [255, 255, 255], [255, 255, 255]] RGB: [[255, 255, 251], [255, RGB: [[ 255, 251], [ RGB: [[255, 255, 255, 255, 251]] [[255, 255255255, ], [255, 255, 255], [255, 255, 255]] [[255, 255, 255], [255, 255, 255], [255, 255, 255]] [[255, 255, 255], [255, 255, 255], [255RGB: [[, 251255, 255, 255], [255, 255, 255], [255, 255, 255]] [[255, 255, 255], [255, 255, 255], [255, 255, 255]] [[255, 255, 255], [255, 255, 255], [255, 255], [255, 255, 251], [255, 255, 251] RGB: [[, 255, 255]] ] [[255, 255, 251], [255, 255, 251], [255, 255, 251]] , 255]] 255, 255, 255], [255, 255, 255], [255, 255, 255]] [[255, 255, 255], [255, 255, 255], [255, 255, 255]] [[255, 255, 255], [255, 255, 255], [255, 255, 255]] 0 :[255.0000, 255.0000, 255.0000, 255.0000, 255.0000, 255.0000, 255.0000, 255.0000, 255.0000, 255.0000] 255, 255], [255, 255, 255], [255, 255, 255]] [[255, 255, 255], [255, 255, 255], [255, 255, 255]] [[255, 255, 255], [255, 255, 255], [255, 255, 255]] 0 :[255.0000, 255.0000, 255.0000, 255.0000, 255.0000, 255.0000, 255.0000, 255.0000, 255.0000, 255.0000] 0 :[255.0000, 255.0000, 251.0000, 255.0000, 255.0000, 251.0000, 255.0000, 255.0000, 251.0000, 255.0000] 0 :[255.0000, 255.0000, 255.0000, 255.0000, 255.0000, 255.0000, 255.0000, 255.0000, 255.0000, 255.0000] 0 :[255.0000, 255.0000, 255.0000, 255.0000, 255.0000, 255.0000, 255.0000, 255.0000, 255.0000, 255.0000] 2024-07-11 00:16:39 [DEBUG] Inner inference took 21 (test? 0) output_tensor4 :[0.2104, 0.1376, -0.1381, 0.6034, 1.5306, -0.0473, 1.2749, 0.7011, 0.2332, 0.1807] output_tensor_end4 :[1.6542, -0.0701, 0.9269, 0.2174] 2024-07-11 00:16:39 [DEBUG] Quad array 0x297a70000 4 start_offset 0 features4 :[0.2104, 0.1376, -0.1381, 0.6034] 2024-07-11 00:16:39 [DEBUG] Finished inference fine 4 (test 0)!! 2024-07-11 00:16:39 [DEBUG] Inner inference took 23 (test? 0) output_tensor1 :[0.2169, 0.1785, -0.1234, 0.2666, 0.0978, 0.1854, 0.2726, 2.0044, -0.0423, 0.2064] output_tensor_end1 :[0.5314, 0.1115, 1.7439, 0.1835] 2024-07-11 00:16:39 [DEBUG] Quad array 0x297a70000 1 start_offset 0 features1 :[0.2169, 0.1785, -0.1234, 0.2666] 2024-07-11 00:16:39 [DEBUG] Finished inference fine 1 (test 0)!! 2024-07-11 00:16:39 [DEBUG] Inner inference took 24 (test? 0) output_tensor0 :[0.3633, 1.1611, 0.2634, 0.0288, 0.0052, 0.1590, 0.8868, 3.0300, -0.0705, 1.3699] output_tensor_end0 :[0.4783, -0.0353, 1.7936, 0.2264] 2024-07-11 00:16:39 [DEBUG] Quad array 0x297a70000 0 start_offset 0 features0 :[0.3633, 1.1611, 0.2634, 0.0288] 2024-07-11 00:16:39 [DEBUG] Finished inference fine 0 (test 0)!! 2024-07-11 00:16:39 [DEBUG] Inner inference took 25 (test? 0) output_tensor3 :[1.0767, -0.0195, 0.1717, -0.0528, -0.0154, 1.1650, 0.5571, 0.6300, 0.9296, -0.0045] output_tensor_end3 :[-0.1771, -0.1224, -0.0969, 1.7246] 2024-07-11 00:16:39 [DEBUG] Quad array 0x297a70000 3 start_offset 0 features3 :[1.0767, -0.0195, 0.1717, -0.0528] 2024-07-11 00:16:39 [DEBUG] Finished inference fine 3 (test 0)!! 2024-07-11 00:16:39 [DEBUG] Inner inference took 34 (test? 0) output_tensor2 :[0.2455, 0.0602, -0.0981, 0.7068, 0.0038, 0.9310, 1.1705, 1.7438, 0.0154, 0.6575] output_tensor_end2 :[1.0321, 0.0210, 1.3554, 0.3052] 2024-07-11 00:16:39 [DEBUG] Quad array 0x297a70000 2 start_offset 0 features2 :[0.2455, 0.0602, -0.0981, 0.7068] 2024-07-11 00:16:39 [DEBUG] Finished inference fine 2 (test 0)!! 2024-07-11 00:16:48 [DEBUG] Going to store results Quad array 0x297a70000 i 0 FL 576 features0 :[0.3633, 1.1611, 0.2634, 0.0288] features-end572 :[0.4783, -0.0353, 1.7936, 0.2264] Quad array 0x297a70000 i 1 FL 576 features0 :[0.2169, 0.1785, -0.1234, 0.2666] features-end572 :[0.5314, 0.1115, 1.7439, 0.1835] Quad array 0x297a70000 i 2 FL 576 features0 :[0.2455, 0.0602, -0.0981, 0.7068] features-end572 :[1.0321, 0.0210, 1.3554, 0.3052] Quad array 0x297a70000 i 3 FL 576 features0 :[1.0767, -0.0195, 0.1717, -0.0528] features-end572 :[-0.1771, -0.1224, -0.0969, 1.7246] Quad array 0x297a70000 i 4 FL 576 features0 :[0.2104, 0.1376, -0.1381, 0.6034] features-end572 :[1.6542, -0.0701, 0.9269, 0.2174] 2024-07-11 00:16:48 [DEBUG] Wrote total of 4500 features , found 0 bad images, total so far 4500, filename fastdup/atrain_features.dat stats width: 100 height: 87 unique: 256 blur: 12685.631836, mean: 129.354675 min: 0.000000 max: 255.000000 stdv: 98.006622 file_fize: 3823 stats width: 100 height: 87 unique: 255 blur: 16807.019531, mean: 151.179581 min: 0.000000 max: 255.000000 stdv: 105.936523 file_fize: 3432 stats width: 100 height: 87 unique: 256 blur: 19594.638672, mean: 177.504135 min: 0.000000 max: 255.000000 stdv: 94.880737 file_fize: 3784 stats width: 100 height: 87 unique: 252 blur: 7786.753418, mean: 240.129578 min: 0.000000 max: 255.000000 stdv: 42.584698 file_fize: 2065 stats width: 100 height: 87 unique: 256 blur: 21843.732422, mean: 197.045013 min: 0.000000 max: 255.000000 stdv: 91.041908 file_fize: 3081 2024-07-11 00:16:48 [DEBUG] Wrote total of 4500 stats in fastdup/atrain_stats.csv 2024-07-11 00:16:48 [DEBUG] Done store results 2024-07-11 00:16:48 [INFO] Found total 4500 images to run on 2024-07-11 00:16:48 [DEBUG] Going to init quad array of size 1000 2024-07-11 00:16:48 [DEBUG] Going to run 5 batches with reminder 500 2024-07-11 00:16:48 [DEBUG] Going to run single thread normalization of 1000 from offet 0 2024-07-11 00:16:48 [DEBUG] Going to run single thread normalization of 1000 from offet 576000 2024-07-11 00:16:48 [DEBUG] Going to run single thread normalization of 1000 from offet 1152000 2024-07-11 00:16:48 [DEBUG] Going to run single thread normalization of 500 from offet 2304000 2024-07-11 00:16:48 [DEBUG] Finished single thread normalization after normalization10 :[0.0170, 0.0543, 0.0123, 0.0013] 2024-07-11 00:16:48 [DEBUG] 3) Going to train NN model. Train sample factor 1.000000 howmany 4500 2024-07-11 00:16:48 [DEBUG] 3) Finished train() NN model 2024-07-11 00:16:49 [DEBUG] 265) Finished add() NN model 2024-07-11 00:16:49 [DEBUG] Total data points added= 4500 2024-07-11 00:16:49 [INFO] 268) Finished write_index() NN model 2024-07-11 00:16:49 [INFO] Stored nn model index file fastdup/nnf.index 2024-07-11 00:16:49 [DEBUG] 349) Finished search() NN model 2024-07-11 00:16:49 [DEBUG] KNN results 0 : 1.00000 1361 : 0.92043 596 : 0.90034 1 : 1.00000 1360 : 0.99972 2074 : 0.85255 2 : 1.00000 3604 : 0.97096 4164 : 0.87843 3 : 1.00000 1198 : 0.96876 3457 : 0.76420 4 : 1.00000 3605 : 0.99133 2628 : 0.94568 5 : 1.00000 1973 : 0.98401 2811 : 0.83700 6 : 1.00000 971 : 0.99720 4303 : 0.72761 7 : 1.00000 3205 : 0.95192 4303 : 0.76204 8 : 1.00000 4213 : 0.93754 4304 : 0.92917 9 : 1.00000 3980 : 0.99856 1313 : 0.79341 2024-07-11 00:16:49 [DEBUG] Found total results 9081 2024-07-11 00:16:49 [DEBUG] Replacing lower threshold 0.050000 with position 8627 top_k.size() 9081 loc pos: 0.742899 last pos: 0.600066 0.950000 8626.949993 2024-07-11 00:16:49 [DEBUG] Going to print top_k of len 9081 to fastdup/similarity.csv 2024-07-11 00:16:49 [DEBUG] Found from=to 699 with run_mode 0 2024-07-11 00:16:49 [DEBUG] Found from=to 1069 with run_mode 0 2024-07-11 00:16:49 [DEBUG] Found from=to 700 with run_mode 0 2024-07-11 00:16:49 [DEBUG] Found from=to 1068 with run_mode 0 2024-07-11 00:16:49 [DEBUG] Found from=to 698 with run_mode 0 2024-07-11 00:16:49 [DEBUG] Found from=to 1063 with run_mode 0 2024-07-11 00:16:49 [DEBUG] Found from=to 1062 with run_mode 0 2024-07-11 00:16:49 [DEBUG] Found from=to 701 with run_mode 0 2024-07-11 00:16:49 [DEBUG] Found from=to 696 with run_mode 0 2024-07-11 00:16:49 [DEBUG] Found from=to 695 with run_mode 0 2024-07-11 00:16:49 [DEBUG] Found from=to 406 with run_mode 0 2024-07-11 00:16:49 [DEBUG] Found from=to 407 with run_mode 0 2024-07-11 00:16:49 [DEBUG] Found from=to 693 with run_mode 0 2024-07-11 00:16:49 [DEBUG] Found from=to 409 with run_mode 0 2024-07-11 00:16:49 [DEBUG] Found from=to 410 with run_mode 0 2024-07-11 00:16:49 [DEBUG] Found from=to 1016 with run_mode 0 2024-07-11 00:16:49 [DEBUG] Found from=to 411 with run_mode 0 2024-07-11 00:16:49 [DEBUG] Found from=to 1015 with run_mode 0 2024-07-11 00:16:49 [DEBUG] Found from=to 1014 with run_mode 0 2024-07-11 00:16:49 [DEBUG] Found from=to 1013 with run_mode 0 2024-07-11 00:16:49 [DEBUG] Found from=to 1011 with run_mode 0 2024-07-11 00:16:49 [DEBUG] Found from=to 3987 with run_mode 0 2024-07-11 00:16:49 [DEBUG] Found from=to 3991 with run_mode 0 2024-07-11 00:16:49 [DEBUG] Found from=to 1008 with run_mode 0 2024-07-11 00:16:49 [DEBUG] Found from=to 3993 with run_mode 0 2024-07-11 00:16:49 [DEBUG] Found from=to 1988 with run_mode 0 2024-07-11 00:16:49 [DEBUG] Found from=to 1987 with run_mode 0 2024-07-11 00:16:49 [DEBUG] Found from=to 1985 with run_mode 0 2024-07-11 00:16:49 [DEBUG] Found from=to 1984 with run_mode 0 2024-07-11 00:16:49 [DEBUG] Found from=to 1983 with run_mode 0 2024-07-11 00:16:49 [DEBUG] Found from=to 1981 with run_mode 0 2024-07-11 00:16:49 [DEBUG] Found from=to 1980 with run_mode 0 2024-07-11 00:16:49 [DEBUG] Found from=to 2511 with run_mode 0 2024-07-11 00:16:49 [DEBUG] Found from=to 2514 with run_mode 0 2024-07-11 00:16:49 [DEBUG] Found from=to 2515 with run_mode 0 2024-07-11 00:16:49 [DEBUG] Found from=to 2516 with run_mode 0 2024-07-11 00:16:49 [DEBUG] Found from=to 1942 with run_mode 0 2024-07-11 00:16:49 [DEBUG] Found from=to 1941 with run_mode 0 2024-07-11 00:16:49 [DEBUG] Found from=to 2517 with run_mode 0 2024-07-11 00:16:49 [DEBUG] Found from=to 2518 with run_mode 0 2024-07-11 00:16:49 [DEBUG] Found from=to 1940 with run_mode 0 2024-07-11 00:16:49 [DEBUG] Found from=to 1939 with run_mode 0 2024-07-11 00:16:49 [DEBUG] Found from=to 1938 with run_mode 0 2024-07-11 00:16:49 [DEBUG] Found from=to 1937 with run_mode 0 2024-07-11 00:16:49 [DEBUG] Found from=to 1936 with run_mode 0 2024-07-11 00:16:49 [DEBUG] Found from=to 3365 with run_mode 0 2024-07-11 00:16:49 [DEBUG] Found from=to 3363 with run_mode 0 2024-07-11 00:16:49 [DEBUG] Found from=to 3362 with run_mode 0 2024-07-11 00:16:49 [DEBUG] Found from=to 3360 with run_mode 0 2024-07-11 00:16:49 [DEBUG] Found from=to 3358 with run_mode 0 2024-07-11 00:16:49 [DEBUG] Found from=to 3357 with run_mode 0 2024-07-11 00:16:49 [DEBUG] Found from=to 2512 with run_mode 0 2024-07-11 00:16:49 [DEBUG] Found from=to 1986 with run_mode 0 2024-07-11 00:16:49 [DEBUG] Found from=to 1943 with run_mode 0 2024-07-11 00:16:49 [DEBUG] Found from=to 2519 with run_mode 0 2024-07-11 00:16:49 [DEBUG] Found from=to 3994 with run_mode 0 2024-07-11 00:16:49 [DEBUG] Found from=to 1065 with run_mode 0 2024-07-11 00:16:49 [DEBUG] Found from=to 1010 with run_mode 0 2024-07-11 00:16:49 [DEBUG] Found from=to 3990 with run_mode 0 2024-07-11 00:16:49 [DEBUG] Found from=to 413 with run_mode 0 2024-07-11 00:16:49 [DEBUG] Found from=to 412 with run_mode 0 2024-07-11 00:16:49 [DEBUG] Found from=to 3364 with run_mode 0 2024-07-11 00:16:49 [DEBUG] Found from=to 405 with run_mode 0 2024-07-11 00:16:49 [DEBUG] Found from=to 3361 with run_mode 0 2024-07-11 00:16:49 [DEBUG] Found from=to 3359 with run_mode 0 2024-07-11 00:16:49 [DEBUG] Found from=to 1066 with run_mode 0 2024-07-11 00:16:49 [DEBUG] Found from=to 1935 with run_mode 0 2024-07-11 00:16:49 [DEBUG] Found from=to 1009 with run_mode 0 2024-07-11 00:16:49 [DEBUG] Found from=to 1064 with run_mode 0 2024-07-11 00:16:49 [DEBUG] Found from=to 3989 with run_mode 0 2024-07-11 00:16:49 [DEBUG] Found from=to 3988 with run_mode 0 2024-07-11 00:16:49 [DEBUG] Found from=to 408 with run_mode 0 2024-07-11 00:16:49 [DEBUG] Found from=to 694 with run_mode 0 2024-07-11 00:16:49 [DEBUG] Found from=to 3995 with run_mode 0 2024-07-11 00:16:49 [DEBUG] Found from=to 1982 with run_mode 0 2024-07-11 00:16:49 [DEBUG] Found from=to 1012 with run_mode 0 2024-07-11 00:16:49 [DEBUG] Found from=to 697 with run_mode 0 2024-07-11 00:16:49 [DEBUG] Found from=to 3992 with run_mode 0 2024-07-11 00:16:49 [DEBUG] Found from=to 2513 with run_mode 0 2024-07-11 00:16:49 [DEBUG] Found from=to 1070 with run_mode 0 2024-07-11 00:16:49 [DEBUG] Found from=to 1067 with run_mode 0 2024-07-11 00:16:49 [INFO] Total time took 9422 ms 2024-07-11 00:16:49 [INFO] Found a total of 1833 fully identical images (d>0.990), which are 20.37 % of total graph edges 2024-07-11 00:16:49 [INFO] Found a total of 624 nearly identical images(d>0.980), which are 6.93 % of total graph edges 2024-07-11 00:16:49 [INFO] Found a total of 5348 above threshold images (d>0.900), which are 59.42 % of total graph edges 2024-07-11 00:16:49 [INFO] Found a total of 454 outlier images (d<0.050), which are 5.04 % of total graph edges 2024-07-11 00:16:49 [INFO] Min similarity found 0.600 max similarity 1.000 2024-07-11 00:16:49 [INFO] Example similar files from,to,distance 700,682,1.000000 699,681,1.000000 1069,1051,1.000000 1068,1050,1.000000 2024-07-11 00:16:49 [INFO] Running connected components for ccthreshold 0.960000 2024-07-11 00:16:49 [DEBUG] 9081 After removing edges removed 4815 edges remained with 4266 h 0 2024-07-11 00:16:49 [INFO] .2024-07-11 00:16:49 [INFO] 02024-07-11 00:16:49 [DEBUG] Last component id was 2787 2024-07-11 00:16:49 [DEBUG] Total component stats size is 2787 last component was 2787 2024-07-11 00:16:49 [DEBUG] Going to store components to file fastdup/connected_components.csv 0%| | 0/3 [00:00", line 124, in normalize_dataset File "", line 112, in done File "", line 76, in fatal_error fastdup.pipeline.common.dataset_db_updater.PipelineFatalError: "Error: Insufficient number of valid images in dataset (0). Minimum required images: 10". Please note that only the following image formats are supported: ['.png', '.jpg', '.jpeg', '.gif', '.giff', '.tif', '.tiff', '.heic', '.heif', '.bmp', '.webp', '.jfif'] The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/PATH/TO/PROJECT/.venv_3.11/lib/python3.11/site-packages/fastdup/sentry.py", line 135, in inner_function ret = func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/PATH/TO/PROJECT/.venv_3.11/lib/python3.11/site-packages/fastdup/fastdup_controller.py", line 630, in run do_visual_layer(work_dir=self._work_dir, input_dir=vl_input, File "/PATH/TO/PROJECT/.venv_3.11/lib/python3.11/site-packages/fastdup/fastdup_runner/run.py", line 194, in do_visual_layer raise ExplorationError(e) from e fastdup.fastdup_runner.utilities.ExplorationError: "Error: Insufficient number of valid images in dataset (0). Minimum required images: 10". Please note that only the following image formats are supported: ['.png', '.jpg', '.jpeg', '.gif', '.giff', '.tif', '.tiff', '.heic', '.heif', '.bmp', '.webp', '.jfif'] Traceback (most recent call last): File "/PATH/TO/PROJECT/.venv_3.11/lib/python3.11/site-packages/fastdup/fastdup_runner/run.py", line 178, in do_visual_layer run_pipeline(input_dir, pbar) File "/PATH/TO/PROJECT/.venv_3.11/lib/python3.11/site-packages/fastdup/fastdup_runner/fastdup_runner_pipeline.py", line 38, in run_pipeline Settings.DATASET_SIZE_BYTES = normalize_dataset(Settings.DATASET_ID, input_dir, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/PATH/TO/PROJECT/.venv_3.11/lib/python3.11/site-packages/fastdup/vl/utils/useful_decorators.py", line 113, in wrapper res = func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "", line 124, in normalize_dataset File "", line 112, in done File "", line 76, in fatal_error fastdup.pipeline.common.dataset_db_updater.PipelineFatalError: "Error: Insufficient number of valid images in dataset (0). Minimum required images: 10". Please note that only the following image formats are supported: ['.png', '.jpg', '.jpeg', '.gif', '.giff', '.tif', '.tiff', '.heic', '.heif', '.bmp', '.webp', '.jfif'] The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/PATH/TO/PROJECT/main.py", line 49, in main() File "/PATH/TO/PROJECT/main.py", line 35, in main ps(pb, SPLIT_BEGIN, SPLIT_LIMIT, SPLIT_BATCH_SIZE, SPLIT_DEDUPLICATE) File "/PATH/TO/PROJECT/captcha_process.py", line 70, in start fd.run(overwrite=True, verbose=True) File "/PATH/TO/PROJECT/.venv_3.11/lib/python3.11/site-packages/fastdup/engine.py", line 157, in run return super().run(annotations=annotations, input_dir=input_dir, subset=subset, data_type=data_type, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/PATH/TO/PROJECT/.venv_3.11/lib/python3.11/site-packages/fastdup/sentry.py", line 148, in inner_function raise ex File "/PATH/TO/PROJECT/.venv_3.11/lib/python3.11/site-packages/fastdup/sentry.py", line 135, in inner_function ret = func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/PATH/TO/PROJECT/.venv_3.11/lib/python3.11/site-packages/fastdup/fastdup_controller.py", line 630, in run do_visual_layer(work_dir=self._work_dir, input_dir=vl_input, File "/PATH/TO/PROJECT/.venv_3.11/lib/python3.11/site-packages/fastdup/fastdup_runner/run.py", line 194, in do_visual_layer raise ExplorationError(e) from e fastdup.fastdup_runner.utilities.ExplorationError: "Error: Insufficient number of valid images in dataset (0). Minimum required images: 10". Please note that only the following image formats are supported: ['.png', '.jpg', '.jpeg', '.gif', '.giff', '.tif', '.tiff', '.heic', '.heif', '.bmp', '.webp', '.jfif'] Exception ignored in: Traceback (most recent call last): File "/PATH/TO/PROJECT/.venv_3.11/lib/python3.11/site-packages/tqdm/std.py", line 1148, in __del__ File "/PATH/TO/PROJECT/.venv_3.11/lib/python3.11/site-packages/tqdm/std.py", line 1302, in close File "/PATH/TO/PROJECT/.venv_3.11/lib/python3.11/site-packages/tqdm/std.py", line 1495, in display File "/PATH/TO/PROJECT/.venv_3.11/lib/python3.11/site-packages/tqdm/std.py", line 459, in print_status File "/PATH/TO/PROJECT/.venv_3.11/lib/python3.11/site-packages/tqdm/std.py", line 453, in fp_write File "/PATH/TO/PROJECT/.venv_3.11/lib/python3.11/site-packages/tqdm/utils.py", line 196, in inner File "/PATH/TO/PROJECT/.venv_3.11/lib/python3.11/site-packages/rich/file_proxy.py", line 53, in flush File "/PATH/TO/PROJECT/.venv_3.11/lib/python3.11/site-packages/rich/console.py", line 1674, in print File "/PATH/TO/PROJECT/.venv_3.11/lib/python3.11/site-packages/rich/console.py", line 1535, in _collect_renderables File "/PATH/TO/PROJECT/.venv_3.11/lib/python3.11/site-packages/rich/protocol.py", line 28, in rich_cast ImportError: sys.meta_path is None, Python is likely shutting down
galbarnissan commented 2 months ago

@lycika-5mzw, thanks for reporting this. That’s indeed a bug in the recent Fastdup version (2.5) with relative file paths, and we’re about to release a fixed version for macOS (2.6) probably by tomorrow. In the meantime, you can use absolute paths as a workaround.

dbickson commented 2 months ago

Hi @lycika-5mzw version 2.6 is out should fix your issue. Please continue to report any errors, your feedback helps us to improve!