Closed shukkkur closed 10 months ago
I checked those three images in the dataset and they have different hashes. They look similar, but have slight differences. We do remove duplicate images, as long as they have the same md5 hash.
Okey, thank you)
Describe the bug After uploading datasets from RoboFlow universe, identical/duplicate images leaked into Source Images
To Reproduce Steps to reproduce the behavior:
Expected behavior Duplicate images get ignored and only one is stored.
Evidence![Screenshot_20221228_101906](https://user-images.githubusercontent.com/78250180/209845492-2eb85a15-bfc6-41d2-9084-c28ff3a2c285.png)
Desktop (please complete the following information):
Additional context This is a problem because as you can see from the three duplicate images, 2 are in training set and 1 in validation. This is going to lead to skewed results.