soniajoseph / ViT-Prisma

ViT Prisma is a mechanistic interpretability library for Vision Transformers (ViTs).
Other
182 stars 19 forks source link

Put patch-level labels in a Dataloader, ensure it loads in notebook #84

Open soniajoseph opened 9 months ago

soniajoseph commented 9 months ago

ImageNet labels are way too coarse-grained. @themachinefan put ImageNet through a SAM pipeline to get a label for each patch.

The results are here: https://huggingface.co/datasets/Prisma-Multimodal/segmented-imagenet1k-subset

We need to make sure these results load cleanly into a Dataloader, which we can cleanly query to get the label per patch

Some things to keep in mind are:

suchir-madap commented 4 months ago

Picking up this issue!