TL;DR: An ImageNet replacement dataset for self-supervised pretraining without humans
PASS is a large-scale image dataset that does not include any humans, human parts, or other personally identifiable information that can be used for high-quality pretraining while significantly reducing privacy concerns.
The quickest way:
git clone https://github.com/yukimasano/PASS
cd PASS
source download.sh # maybe change the directory where you want to download it
Generally: all information is on our webpage.
For downloading the dataset, please visit our dataset on zenodo. There you can download it in tar files and find the meta-data.
You can also download the images from their AWS urls, from here.
Pretraining | Method | Epochs | IN-1k Acc. | Places205 Acc. | |
---|---|---|---|---|---|
(IN-1k) | MoCo-v2 | 200 | 60.6 | 50.1 | visit MoCo-v2 repo |
PASS | MoCo-v2 | 180 | 59.1 | 52.8 | R50 weights |
PASS | MoCo-v2 | 200 | 59.5 | 52.8 | R50 weights |
PASS | MoCo-v2 | 800 | 61.2 | 54.0 | R50 weights |
PASS | MoCo-v2 (R18) | 800 | 45.3 | 44.4 | R18 weights |
PASS | MoCo-v2-CLD | 200 | 60.2 | 53.1 | R50 weights |
PASS | SwAV | 200 | 60.8 | 55.5 | R50 weights |
PASS | DINO | 100 | 61.3 | 54.6 | ViT S16 weights |
PASS | DINO | 300 | 65.0 | 55.7 | ViT S16 weights |
In the table above we give the download links to the full checkpoints (including momentum encoder etc.) to the models we've trained. For comparison, we include MoCo-v2 trained on ILSVRC-12 ("IN-1k") and report linear probing performance on IN-1k and Places205.
import torch
vits16_100ep = torch.hub.load('yukimasano/PASS:main', 'dino_100ep_vits16')
vits16 = torch.hub.load('yukimasano/PASS:main', 'dino_vits16')
r50_swav_200ep = torch.hub.load('yukimasano/PASS:main', 'swav_resnet50')
r50_moco_800ep = torch.hub.load('yukimasano/PASS:main', 'moco_resnet50')
r50_moco_cld_200ep = torch.hub.load('yukimasano/PASS:main', 'moco_cld_resnet50')
In the folder PASSify of this repo, you can find automated scripts that try to remove humans from image datasets.
Please let us know if you have a model pretrained on this dataset and I will add this to the list above.
@Article{asano21pass,
author = "Yuki M. Asano and Christian Rupprecht and Andrew Zisserman and Andrea Vedaldi",
title = "PASS: An ImageNet replacement for self-supervised pretraining without humans",
journal = "NeurIPS Track on Datasets and Benchmarks",
year = "2021"
}