Closed zilunzhang closed 1 year ago
Hi @zilunzhang for step 4, can you please try to run with input_dir=images_dir. Tar/zip file are only supported with v0.2 API and not with v1 api so running fd = fastdup.create() does not work yet on compressed files.
Hi @zilunzhang for step 4, can you please try to run with input_dir=images_dir. Tar/zip file are only supported with v0.2 API and not with v1 api so running fd = fastdup.create() does not work yet on compressed files.
Thanks Danny, that works.
Maybe you would like to revise the documentation here sometimes...
One more thing, I found that the tmp folder of storing unzipped images (/media/zilun/wd-16/RS5M_T_dataset/tmp-clean-outlier-zip-official/tmp/mediazilunwd-16RS5M_T_datasetziptmpvlmf-0.95_cf-0.95.zip) became empty after running step 4, which causes the outlier report cannot find image to show through the path. For example, failed to read image from img_path /media/zilun/wd-16/RS5M_T_dataset/tmp-clean-outlier-zip-official/tmp/mediazilunwd-16RS5M_T_datasetziptmpvlmf-0.95_cf-0.95.zip/laion2b_128_107424.jpg. Any suggestion on that...?
HI @zilunzhang we already fixed the documentation! We recommend running with turi_param='delete_tar=0,delete_img=0' in case you don't want to delete tar downloaded from s3 and do not delete images. We will fix the doc as well.
HI @zilunzhang we already fixed the documentation! We recommend running with turi_param='delete_tar=0,delete_img=0' in case you don't want to delete tar downloaded from s3 and do not delete images. We will fix the doc as well.
Thank you, I will try it now!
What happened?
I tried to use local zipped images for data cleaning, and I encountered a problem...
Following this link and this link, I run the fastdup (v1.3) with code:
An error pops up when line 4 was running:
The structure of work directory is:
Then I tried to deal with zipped images in this way:
An error pops up again in line 3:
The structure of work directory is:
There isn't any flie named "atrain_features.dat.csv", but "atrain_mediazilunwd-16RS5M_T_datasetziptmpvlmf-0.95_cf-0.95.zipfeatures.dat.csv" exists. Maybe the filename is incorrect?
What did you expect to see?
Successfully execute line 4 without any error. All outliers can be listed.
What version of fastdup were you runnning on?
1.3
What version of Python were you running on?
Python 3.8
Operating System
Ubuntu 22.04.1 LTS
Reproduction steps
No response
Relevant log output
Attach a screenshot [Optional]
Contact Details [Optional]
zilun@cs.toronto.edu