Closed Yann-CV closed 1 month ago
HI @Yann-CV apologies for the unclear error message. You did not specify input_dir, should be a list of image locations, matching in length to the numpy embeddings array. In addition work_dir should point to a temporary work place to store intermediate files. We will fix the error to be clearer.
This is the correct format as expressed in our tutorial: Run On Pre-computed Feature Vectors
If you have pre-computed feature vectors using fastdup or any other methods, you can input the features directly into fastdup to analyze for issues. Running fastdup on feature vectors instead of raw images decreases run time significantly.
The following code snippet shows how to run with your own feature stored in a numpy matrix, along with a list of the matching filenames.
import numpy as np
import fastdup
# Replace the below code with computation of your own features
matrix = np.random.rand(2, 576).astype('float32')
flist = ["/data/myimage1.jpg", "/data/myimage2.jpg"]
# Files should contain absolute path and not relative path
fd = fastdup.create(input_dir='/data/', work_dir='output')
fd.run(annotations=flist, embeddings=matrix)
@dbickson thanks for the feedback. Indeed it works with the list of filenames.
In my opinion, if this image list is required at some point within the run, it should not be annotated as Optional in the code.
hi @Yann-CV version 1.125 defends better against embeddings without the file list. Let us know if you observer any other issue.
What happened?
Running Fastdup.run is crashing when providing already computed embeddings
What did you expect to see?
no failure
What version of fastdup were you runnning on?
1.124
What version of Python were you running on?
Python 3.10
Operating System
Ubuntu 20.04
Reproduction steps
Relevant log output
pandas==2.2.2