Closed zankner closed 3 years ago
@zankner Hi Zach! It's a different mask, for masking out attention to specific patches. It wouldn't matter at all if you always use the same sized image, but if you somehow have different sized images padded to be a full square, you can selectively mask out the padding patches
@zankner I feel like I shouldn't have agreed to build it (some user requested it when the repo was still young) it's really not needed for the majority of the use-cases (same sized images), and just makes the repo more complicated than it needs to be
@zankner the masked training you are thinking of won't work with ViT anyways
I might be wrong, but in the original paper didn't they perform masked token prediction for self-supervision?
"We employ the masked patch prediction objective for preliminary self-supervision experiments. To do so we corrupt 50% of patch embeddings by either replacing their embeddings with a learnable [mask] embedding (80%), a random other patch embedding (10%) or just keeping them as is (10%)."
@zankner I missed that section! Wow, so it can work, with predicting the 3 bit mean of the colors being enough
Yeah I think so. I made an implementation of it already on my own mock of the vision transformer. Would there be any interest in a PR for that?
@zankner I would gratefully accept! 💯
I think the latest self supervised learning techniques will probably work better https://github.com/lucidrains/vit-pytorch#self-supervised-training
That's probably true, but I think at least having the ability to do masked patch prediction allows for people to do different research or experiment with new things.
@zankner I am also interested about the ability to do masked patch prediction using VIT. I do lots of experiments about VIT using BYOL for good transfering performance, but I think VIT can enhance itself using the tricks from BERT and GPT. So what is your plan for the feature?
@guanfuchen If people would want it I can start working on a PR for it. I don't have that much free time so would you be able to help at all in a PR? I have it all set up on my implementation of VIT, so the work would mostly be integrating it into this repo.
@zankner yes, you can give the implementation, I will test and merge it.
@lucidrains @guanfuchen - Started PR for masked patch prediction. Currently is a PR draft. Still have work to do but wanted to post it in case anyone has suggestions or optimizations to be made.
@guanfuchen @zankner merged here https://github.com/lucidrains/vit-pytorch#masked-patch-prediction
Just a quick question. I was wondering if the masks which get passed to the model at inference are for the purposes of masked tokens for self-supervision or if they are a different mask? Thanks