Closed wangxiong172086864 closed 6 months ago
Hi @wangxiong172086864 your images are special and I am not sure our default embedding is right. Can you try to run with model_path='dinov2s' and let us know if the similarity is better?
Hi @wangxiong172086864 your images are special and I am not sure our default embedding is right. Can you try to run with model_path='dinov2s' and let us know if the similarity is better?
Tried with dinov2s, and the result still not good. May be the reason is not about the embedding, I pulled the images from this component out and ran fastdup again just with them got much better result.
HI @wangxiong172086864 fastdup runs an approximation, so if you want an exact computation run with nnf_mode='Flat' (the reason it is not the default it runs slower and requires more RAM). Assuming your dataset is bellow a million images it should run fine.
HI @wangxiong172086864 fastdup runs an approximation, so if you want an exact computation run with nnf_mode='Flat' (the reason it is not the default it runs slower and requires more RAM). Assuming your dataset is bellow a million images it should run fine.
Danny, thanks, using nnf_mode='Flat' helps, similarity of images is better. And also higher up the ccthreshold value can get better result.
@wangxiong172086864 if you have your own embeddings extracted using models trained on your dataset, it might work better than the generic models used in fastdup.
Here's an example of how to run fastdup on your own embeddings https://visual-layer.readme.io/docs/run-on-precomputed-feature-vectors
Another option is to increase nearest_neighbor_k parameter for example to 50. The default is 3.
What happened?
running fastdup with ccthreshold = 0.97, I found the component 29 contains lots of low similarity imags. So I calculate the cosine similarity of this component using pytorch, cosine similarity of some pair of image is lower than 0.97, like 0.90, 0.82 So how the ccthreshold work?
What did you expect to see?
No response
What version of fastdup were you runnning on?
fastdup 1.63
What version of Python were you running on?
Python 3.8
Operating System
Ubuntu 20.04 LTS
Reproduction steps
No response
Relevant log output
No response
Attach a screenshot [Optional]
No response
Contact Details [Optional]
No response