lucidrains / vit-pytorch

Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch
MIT License
20.61k stars 3.05k forks source link

[Feature Request] ViTDet #252

Open austinmw opened 1 year ago

austinmw commented 1 year ago

Any chance you'd want to add ViTDet?

lucidrains commented 1 year ago

oh, do you have a link to the paper?

lucidrains commented 1 year ago

@austinmw i see it, sure!

austinmw commented 1 year ago

Thanks! here's also a second related paper: https://arxiv.org/pdf/2111.11429.pdf

lucidrains commented 1 year ago

nice! yea, big fan of Kaiming's works, will put those in the queue

austinmw commented 1 year ago

I'm not personally aware of anyone using it directly for projects, but I've heard it referenced as a good learning resource for vits.

I was looking to use MMDetection's ViTDet PR for something, but thought a clean self-contained implementation would be nicer to read.

On Sat, Feb 11, 2023 at 11:55 AM Phil Wang @.***> wrote:

@austinmw https://github.com/austinmw is AWS ML lab using this repository? or is this for your own personal research?

— Reply to this email directly, view it on GitHub https://github.com/lucidrains/vit-pytorch/issues/252#issuecomment-1426823313, or unsubscribe https://github.com/notifications/unsubscribe-auth/AC5IOZVLFTD5224TTFVQBZ3WW7AB3ANCNFSM6AAAAAAUV25PVI . You are receiving this because you were mentioned.Message ID: @.***>

lucidrains commented 1 year ago

@austinmw ohh got it, so this is only for educational purposes then

lucidrains commented 1 year ago

also, just coming to the realization that even deleting a comment right after it gets created is too late; it still goes to the subscriber's email haha