huggingface / transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
https://huggingface.co/transformers
Apache License 2.0
129.62k stars 25.74k forks source link

dinov2 with REGISTERS #27379

Open betterze opened 8 months ago

betterze commented 8 months ago

Model description

Dear huggingface team,

The fair team published an improved version of dinov2 VISION TRANSFORMERS NEED REGISTERS. The models and checkpoints are available in the dinov2 website, but not in hugging face.

Could you add this new model? I really appreciate your work.

Best Wishes,

Zongze

Open source status

Provide useful links for the implementation

dinov2 reg checkpoint

mhdirnjbr commented 8 months ago

Hello! Could you kindly assign this task to me? I'm eager to take it on as my first contribution and greatly appreciate any guidance or considerations you can provide. Thank you in advance.

betterze commented 8 months ago

I am not a team member of hugging face, I can not 'assign the task to you'. But I believe you are very welcome to work on it, a lot of people will be benefited from your work. Thx

mhdirnjbr commented 8 months ago

Hello @amyeroberts @NielsRogge! Can I please know your opinion about this endeavor? Thank you in advance.

StarCycle commented 3 months ago

After reading the paper, I fully agree that DINOv2 with registers has better performance.

What's the current progress of this issue? @mhdirnjbr Did you submit a PR? I think timm has an implementation with registers

amyeroberts commented 2 months ago

@StarCycle @mhdirnjbr If any of you would like to, please feel free to open a PR to add this to the library!

rvt123 commented 2 months ago

Is Anyone working on this?

NielsRogge commented 2 months ago

No. It's only 2 lines of code different compared to DINOv2 so feel free to take it up

rvt123 commented 2 months ago

Thanks for replying @NielsRogge, and I would take it up if I knew what I was doing. Maybe you could point me in the right direction I could give it a go.

Edit: I tried looking at both timm and Huggingface implementations but tbh I don't have a clue as to what's happening. If you do manage to take I would be grateful for your help.