An unsupervised and free tool for image and video dataset analysis founded by the authors of XGBoost, Apache TVM & Turi Create - Danny Bickson, Carlos Guestrin and Amir Alush.
<a href="https://visual-layer.readme.io/" target="_blank" rel="noopener noreferrer">Documentation</a>
·
<a href="#features--advantages" target="_blank" rel="noopener noreferrer">Features</a>
·
<a href="https://github.com/visual-layer/fastdup/issues/new/choose" target="_blank" rel="noopener noreferrer">Report Bug</a>
·
<a href="https://medium.com/visual-layer" target="_blank" rel="noopener noreferrer">Blog</a>
·
<a href="#getting-started" target="_blank" rel="noopener noreferrer">Quickstart</a>
·
<a href="#visual-layer-cloud" target="_blank" rel="noopener noreferrer">Visual Layer Cloud</a>
<hr>
pip
install fastdup from PyPI:
pip install fastdup
More installation options are available here.
Initialize and run fastdup:
import fastdup
fd = fastdup.create(input_dir="IMAGE_FOLDER/")
fd.run()
Explore the results in a interactive web UI:
fd.explore()
Alternatively, visualize the result in a static gallery:
fd.vis.duplicates_gallery() # gallery of duplicates
fd.vis.outliers_gallery() # gallery of outliers
fd.vis.component_gallery() # gallery of connected components
fd.vis.stats_gallery() # gallery of image statistics (e.g. blur, brightness, etc.)
fd.vis.similarity_gallery() # gallery of similar images
fastdup handles labeled/unlabeled datasets in image or video format, providing a range of features:
What sets fastdup apart from other similar tools:
Learn the basics of fastdup through interactive examples. View the notebooks on GitHub or nbviewer. Even better, run them on Google Colab or Kaggle, for free.
![]() |
⚡ Quickstart: Learn how to install fastdup, load a dataset and analyze it for potential issues such as duplicates/near-duplicates, broken images, outliers, dark/bright/blurry images, and view visually similar image clusters. If you're new, start here!
📌 Dataset: Oxford-IIIT Pet. |
![]() |
![]() |
||
![]() |
||
![]() |
||
![]() |
🧹 Finding and Removing Duplicates: Learn how to how to analyze an image dataset for duplicates and near-duplicates.
📌 Dataset: Oxford-IIIT Pet. |
![]() |
![]() |
||
![]() |
||
![]() |
||
![]() |
🖼 Finding and Removing Mislabels: Learn how to analyze an image dataset for potential image mislabels and export the list of mislabeled images for further inspection.
📌 Dataset: Food-101. |
![]() |
![]() |
||
![]() |
||
![]() |
||
![]() |
🎁 Image Similarity Search: Perform image search in a large dataset of images.
📌 Dataset: Shopee Product Matching. |
![]() |
![]() |
||
![]() |
||
![]() |
||
![]() |
🤗 Hugging Face Datasets: Load and analyze datasets from Hugging Face Datasets. Perfect if you already have a dataset hosted on Hugging Face hub. |
![]() |
![]() |
||
![]() |
||
![]() |
||
![]() |
🧠 TIMM Embeddings: Compute dataset embeddings using TIMM (PyTorch Image Models) and run fastdup over the them to surface dataset issues. Runs on CPU and GPU. |
![]() |
![]() |
||
![]() |
||
![]() |
||
![]() |
🦖 ONNX Embeddings: Bring your own ONNX model. In this example we extract feature vectors of your images using DINOv2 model. Runs on CPU. |
![]() |
![]() |
||
![]() |
||
![]() |
See more examples.
Get help from the fastdup team or community members via the following channels:
</a>
</a>
Community-contributed blog posts on fastdup:
What our users say:
Visual Layer offers commercial services for managing, cleaning, and curating visual data at scale.
Sign-up for free.
https://github.com/visual-layer/fastdup/assets/6821286/57f13d77-0ac4-4c74-8031-07fae87c5b00
Not convinced? Interact with Visual Layer Cloud public dataset with no sign-up required.
fastdup is licensed under Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International Public License.
For any more information or inquiries regarding the license, please contact us at info@visual-layer.com or see the LICENSE file.