johko / computer-vision-course

This repo is the homebase of a community driven course on Computer Vision with Neural Networks. Feel free to join us on the Hugging Face discord: hf.co/join/discord
MIT License
373 stars 123 forks source link

Unit 3 - Vision Transformers: Swin #116

Closed klyap closed 6 months ago

klyap commented 6 months ago

Render preview

Screenshot 2023-12-12 at 3 08 58 PM
klyap commented 6 months ago

I think this is good, would you be down to add a simple PyTorch implementation like in other architecture chapters?

Ah like add a notebook file? Or code snippet within this page?

UPDATE: Oh nvm I saw another PR for this and it looks like it's a Python notebook file!

klyap commented 6 months ago

I've updated this PR with the changes as requested! I'll make a separate PR for the PyTorch implementation notebook.

klyap commented 6 months ago

Thanks for the PR @klyap 🤗 The content looks good for a start and I like that you included SwinIR and Swin2SR.

Just for my own clarity: Do you plan on elaborating on the different parts in further PRs? Like going a bit more in-depth about how Swin works, why it is a hierarchical model and what is special about these kinds of models (comparison to CNNs e.g.). Maybe also what they might be better at compared to completely patch-based approaches like ViT?

I wasn't planning on it, but I can in the next PR! Thanks for the suggestions.

klyap commented 6 months ago

LGTM as well if you're planning to add a small model implementation script in this doc itself

Just to clarify, you'd like a small implementation script snippet in this .mdx doc as well as a separate notebook file? If so, I can also add that with the next PR along with @johko's suggestions!