visual-layer / fastdup

fastdup is a powerful free tool designed to rapidly extract valuable insights from your image & video datasets. Assisting you to increase your dataset images & labels quality and reduce your data operations costs at an unparalleled scale.
Other
1.54k stars 74 forks source link

[Bug]: Wrong error code #210

Closed nagar-omer closed 1 year ago

nagar-omer commented 1 year ago

What happened?

When running with a corrupted file Im getting ERROR_MISSING_FILE instead of ERROR_ZERO_SIZE_FILE - error code.

``

What did you expect to see?

              filename label  index          error_code  is_valid  fd_index

1.0 images/000000530.jpg dup2 1.0 VALID True 1.0 2.0 images/000000623.jpg foo 2.0 VALID True 2.0 3.0 images/000000623_.jpg NaN 3.0 MISSING_ANNOTATION False 3.0 NaN images/000000001.jpg dup1 NaN ERROR_ZERO_SIZE_FILE False 4.0

What version of fastdup were you runnning on?

1.4

What version of Python were you running on?

Python 3.8

Operating System

Mac

Reproduction steps

This is the code I ran - image images/000000001.jpg is corrupted.

input_dir = 'images'
work_dir = 'fastdup_workdir'
#
df_annot = pd.DataFrame([
    {'filename': 'images/000000001.jpg', 'label': 'dup1'},
    {'filename': 'images/000000530.jpg', 'label': 'dup2'},
    {'filename': 'images/000000623.jpg', 'label': 'foo'},
])
#
fd = fastdup.create(work_dir=work_dir, input_dir=input_dir)
fd.run(threshold=0.8, overwrite=True, annotations=df_annot)

Relevant log output

filename label  index          error_code  is_valid  fd_index
1.0   images/000000530.jpg  dup2    1.0               VALID      True       1.0
2.0   images/000000623.jpg   foo    2.0               VALID      True       2.0
3.0  images/000000623_.jpg   NaN    3.0  MISSING_ANNOTATION     False       3.0
NaN   images/000000001.jpg  dup1    NaN  ERROR_MISSING_FILE     False       4.0

Attach a screenshot [Optional]

No response

Contact Details [Optional]

No response

dbickson commented 1 year ago

@ovednagar error is from python side, c code works fine

dbickson commented 1 year ago

@ovednagar I am not able to reproduce this. I run a folder with 4 images, 2 valid, 1 zero size, 1 missing. I run fastdup and get fd.invalid_instances()

0  omer_test/train_1274.jpg   foo    2.0  ERROR_ZERO_SIZE_FILE     False       2.0
1  omer_test/test_1234a.jpg   NaN    1.0    MISSING_ANNOTATION     False       1.0
2   omer_test/000000001.jpg  dup1    NaN    ERROR_MISSING_FILE     False       3.0

The list of files is

dannybickson@Dannys-MacBook-Pro-2 omer_test % ls -lrt
total 3504
-rwxr-xr-x  1 dannybickson  staff  894163 May 22 17:24 test_1234.jpg
-rw-r--r--  1 dannybickson  staff       0 May 22 17:25 train_1274.jpg
-rwxr-xr-x  1 dannybickson  staff  894163 May 22 17:26 test_1234a.jpg
dbickson commented 1 year ago

should be fixed in 1.5