woctezuma / steam-BEiT

Retrieve Steam games with similar store banners, with Microsoft's Bidirectional Encoder representation from Image Transformers (BEiT).
https://huggingface.co/transformers/master/model_doc/beit.html
MIT License
1 stars 1 forks source link
beit discovery game games microsoft-beit ms-beit steam steam-game steam-games

Steam BEiT: match Steam Banners with Microsoft's BEiT

This repository contains Python code to retrieve Steam games with similar store banners, using Microsoft's BEiT.

Image similarity is assessed by the cosine similarity between image features encoded by BEiT.

Similar vertical banners

Model

BEiT is a Vision Transformer (ViT):

  1. pre-trained with self-supervision (using patches, and "visual tokens" from OpenAI's DALL-E) on ImageNet-21k,
  2. then fine-tuned for classification on ImageNet-21k (14M images with ~21k classes),
  3. finally fine-tuned for classification on ImageNet-1k (1.28M images with 1000 classes).

Pre-trained models are available at HuggingFace, respectively as:

  1. microsoft/beit-base-patch16-224-pt22k
  2. microsoft/beit-base-patch16-224-pt22k-ft22k
  3. microsoft/beit-base-patch16-224

Larger models are available by changing a keyword in their name: large (1.2 GB) instead of base (400 MB).

NB: Table 9 shows that BEiT performs worse than DINO in terms of linear probing on ImageNet-1k. However, keep in mind that DINO concatenates features of intermediate layers!

Data

Data consists of vertical Steam banners (300x450 resolution), resized to 256x384 resolution.

This is performed with rom1504/img2dataset.

Usage

Alternatively, you can find the data as v0.1 in the "Releases" section of this repository.

References