Closed kaane8520 closed 1 year ago
Hi @kaane8520 thanks for reaching out ! Your ask is very clear, in our terminology you would like to compare train and test dirs s.t. that relations are only between train and test but not internal.
Once you run fastdup on the train dataset only it builds a nearest neighbor model and stores it to the work_dir. Next you run on the same input and work_dir with run_mode=3 and point the test_dir into a new set of images you will get compared images to the train. Detailed explanation is found here: https://github.com/visual-layer/fastdup/blob/main/RUN.md#resume
If you want to run the opposite, you need to run fastdup again using input_dir pointing to the test_dir, and clean work_dir using the default run_mode, and then run with run_mode=3 where the test_dir points to the training data.
However, since the metric is symmetric, the most similar images from set A to B are the same from B to A, so you can use the same output similarity.csv
file you got.
A parameter to play with is threshold, if the two sets are very different, you may get no output you can try and run with threshold=0 not to remove any similarities from the output and then you will get also law similarities..
BTW I forgot to write that it is possible to simplify this into a single run in case you have both the train and test set available, in this case just run with input_dir pointing to the train, test_dir pointing to the test, and the relationships are computed only between train and test.
The reason you may want to defer is that sometimes test data comes later and you want to use a precomputed trained model.
Thank you for your help
I have two datasets with images: A and B. I would like to compare those two:
I have tried different values for nearest_neighbors_k and threshold but it didn't help.
Could you help me find the problem?