yukimasano / PASS

The PASS dataset: pretrained models and how to get the data
https://www.robots.ox.ac.uk/~vgg/research/pass/
MIT License
262 stars 17 forks source link
computer-vision representation-learning self-supervised-learning

PASS: Pictures without humAns for Self-Supervised Pretraining

TL;DR: An ImageNet replacement dataset for self-supervised pretraining without humans

img.png

Content

PASS is a large-scale image dataset that does not include any humans, human parts, or other personally identifiable information that can be used for high-quality pretraining while significantly reducing privacy concerns.

pass.gif

Download the dataset

The quickest way:

git clone https://github.com/yukimasano/PASS
cd PASS
source download.sh # maybe change the directory where you want to download it

Generally: all information is on our webpage.

For downloading the dataset, please visit our dataset on zenodo. There you can download it in tar files and find the meta-data.

You can also download the images from their AWS urls, from here.

Pretrained models

Pretraining Method Epochs IN-1k Acc. Places205 Acc.
(IN-1k) MoCo-v2 200 60.6 50.1 visit MoCo-v2 repo
PASS MoCo-v2 180 59.1 52.8 R50 weights
PASS MoCo-v2 200 59.5 52.8 R50 weights
PASS MoCo-v2 800 61.2 54.0 R50 weights
PASS MoCo-v2 (R18) 800 45.3 44.4 R18 weights
PASS MoCo-v2-CLD 200 60.2 53.1 R50 weights
PASS SwAV 200 60.8 55.5 R50 weights
PASS DINO 100 61.3 54.6 ViT S16 weights
PASS DINO 300 65.0 55.7 ViT S16 weights

In the table above we give the download links to the full checkpoints (including momentum encoder etc.) to the models we've trained. For comparison, we include MoCo-v2 trained on ILSVRC-12 ("IN-1k") and report linear probing performance on IN-1k and Places205.

Pretrained models from PyTorch Hub

import torch
vits16_100ep = torch.hub.load('yukimasano/PASS:main', 'dino_100ep_vits16')
vits16 = torch.hub.load('yukimasano/PASS:main', 'dino_vits16')
r50_swav_200ep = torch.hub.load('yukimasano/PASS:main', 'swav_resnet50')
r50_moco_800ep = torch.hub.load('yukimasano/PASS:main', 'moco_resnet50')
r50_moco_cld_200ep = torch.hub.load('yukimasano/PASS:main', 'moco_cld_resnet50')

PASSify your dataset

In the folder PASSify of this repo, you can find automated scripts that try to remove humans from image datasets.

Contribute your models

Please let us know if you have a model pretrained on this dataset and I will add this to the list above.

Citation

@Article{asano21pass,
author = "Yuki M. Asano and Christian Rupprecht and Andrew Zisserman and Andrea Vedaldi",
title = "PASS: An ImageNet replacement for self-supervised pretraining without humans",
journal = "NeurIPS Track on Datasets and Benchmarks",
year = "2021"
}