Closed xings19 closed 1 year ago
Hi @xings19 this is not supported yet, give us one day and we will release a new version where you can define an endpoint. Are you working on ubuntu 20?
Thank you very much for your reply, I am using the centos.
CentOS Linux release 7.6.1810 (Core)
I see that the API on the colab demo file is very different from the API version that centos can use. Can you provide corresponding updates? If not, I can only trouble you to support the demand for specifying 'endpoint'. Thank you
Thanks for clarifying, yes centos was not updated for a while this is a legacy version that we do not have many users for, I will also update the centos for the latest for you.
Here's more information for you, when I execute a piece of code like this:
import fastdup
fastdup.run(input_dir="s3://mybucket/",work_dir="/my/work/dir/")
It will out put:
Going to loop over dir s3://mybucket/
This is equivalent to traversing my input path, and the traversal is achieved by the following command:
aws s3 ls --recursive s3://mybucket/ | awk '{print $4}' | egrep -i '\.bmp$|\.jpg$|\.tiff$|\.giff$|\.jpeg$|\.png$|\.tif$|\.tar$|\.tar.gz$|\.zip$|\.tgz$|\.mp4$|\.avi$' > /my/work/dir/tmp/files.txt
In fact, in the official commands provided by aws, it is supported to pass "--endpoint=http://x.x.x.x:x". Therefore, the above command becomes the following so that it can be executed on my platform:
aws --endpoint=http://x.x.x.x:x s3 ls --recursive s3://mybucket/ | awk '{print $4}' | egrep -i '\.bmp$|\.jpg$|\.tiff$|\.giff$|\.jpeg$|\.png$|\.tif$|\.tar$|\.tar.gz$|\.zip$|\.tgz$|\.mp4$|\.avi$' > /my/work/dir/tmp/files.txt
I think I have described my problem clearly, thanks a lot for your replies, it will speed up my research a lot.
Hi @xings19 we have released version 0.907 for centos here: https://github.com/visual-layer/fastdup/releases/tag/v0.908 Please try it out. It has a new environment variable called FASTDUP_S3_ENDPOINT_URL. It is optional. In case it is given, it will add --endpoint-url=[value given in the env variable] to the s3 aws command. Please try it out and let us know if this works for you.
To set the environment variable, before running python do
export FASTDUP_S3_ENDPOINT_URL=https://path.to.your.endpoint
Or run python with
FASTDUP_S3_ENDPOINT_URL=https://path.to.your.endpoint pthon3.8 ....
This package seems to have some bugs, when I download it and run it, I get the following error:
raceback (most recent call last):
File "/mnt/lustre/username/.conda/envs/MAE/lib/python3.7/site-packages/fastdup/__init__.py", line 92, in <module>
dll = CDLL(so_file)
File "/mnt/lustre/username/.conda/envs/MAE/lib/python3.7/ctypes/__init__.py", line 364, in __init__
self._handle = _dlopen(self._name, mode)
OSError: libnnf.so: cannot open shared object file: No such file or directory
Please reach out to fastdup support, it seems installation is missing critical files to start fastdup.
We would love to understand what has gone wrong.
You can open an issue here: https://github.com/visual-layer/fastdup/issues or email us at info@databasevisual.com
Share out output of the command "find /mnt/lustre/username/.conda/envs/MAE/lib/python3.7/site-packages/fastdup "
I think this is because some dependent files are missing, so I found libnnf.so
from the old version of the package I used initially, put it in the corresponding location, and executed the following command:
FASTDUP_S3_ENDPOINT_URL=http://x.x.x.x:x python dupimg.py
A new error has occurred:
terminate called after throwing an instance of 'std::logic_error'
what(): basic_string::_S_construct null not valid
Aborted
I'm not sure why this is, maybe you guys can fix it?
HI @xings19 thanks for your detailed feedback, it helped us pinpoint the problem quickly. Please use this release: https://github.com/visual-layer/fastdup/releases/tag/0.908c and let us know if it works.
On our side it seem to work now
[danny_bickson@fastdup-centos7-build cxx]$ export FASTDUP_S3_ENDPOINT_URL=https://s3.amazonaws.com
[danny_bickson@fastdup-centos7-build cxx]$ ./build/Release/src/fastdup s3://visualdb/sku110k --num_images=10 --work_dir=t1
sh: dpkg: command not found
FastDup Software, (C) copyright 2022 Dr. Amir Alush and Dr. Danny Bickson.
This software is free for non-commercial and academic usage under the Creative Common Attribution-NonCommercial-NoDerivatives 4.0 International license. Please reach out to info@databasevisual.com for licensing options.
Model path is ./UndisclosedFastdupModel.ort
2023-03-22 07:18:44 [INFO] Going to loop over dir s3://visualdb/sku110k
2023-03-22 07:18:45 [INFO] Found total 10 images to run on
[■■■■■■ ] 11% Estimated: 0 Mi[■■■■■■■■■■■ ] 21% Estimated: 0 Mi[■■■■■■■■■■■■■■■■ ] 31% Estimated: 0 Mi[■■■■■■■■■■■■■■■■■■■■■ ] 41% Estimated: 0 Mi[■■■■■■■■■■■■■■■■■■■■■■■■■■ ] 50% Estimated: 0 Mi[■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■ ] 61% Estimated: 0 Mi[■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■ ] 71% Estimated: 0 Mi[■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■ ] 81% Estimated: 0 Mi[■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■ ] 91% Estimated: 0 Mi[■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■] 100% Estimated: 0 M2023-03-22 07:18:51 [INFO] Found total 10 images to run on
2023-03-22 07:18:51 [INFO] 16) Finished write_index() NN model
2023-03-22 07:18:51 [INFO] Stored nn model index file t1/nnf.index
2023-03-22 07:18:51 [INFO] Total time took 6132 ms
2023-03-22 07:18:51 [INFO] Found a total of 0 fully identical images (d>0.990), which are 0.00 %
2023-03-22 07:18:51 [INFO] Found a total of 0 nearly identical images(d>0.980), which are 0.00 %
2023-03-22 07:18:51 [INFO] Found a total of 14 above threshold images (d>0.850), which are 46.67 %
2023-03-22 07:18:51 [INFO] Found a total of 1 outlier images (d<0.050), which are 3.33 %
2023-03-22 07:18:51 [INFO] Min distance found 0.796 max distance 0.934
2023-03-22 07:18:51 [INFO] Running connected components for ccthreshold 0.960000
I made a small mistake back then, it works fine now, thanks for your help!
HI @xings19 great to know this is working for you. Feel free to reach out for any additional issues, your feedback helps to improve fastdup!
Added to docs under Using AWS endpoints
My data is on a private aws, and I need to specify an 'endpoint' to execute commands such as 'aws s3 ls' correctly. How do I specify an 'endpoint' in the code?