Investigate ViT Approaches

wingman-jr-addon / wingman_jr

This is the official repository (https://github.com/wingman-jr-addon/wingman_jr) for the Wingman Jr. Firefox addon, which filters NSFW images in the browser fully client-side: https://addons.mozilla.org/en-US/firefox/addon/wingman-jr-filter/ Optional DNS-blocking using Cloudflare's 1.1.1.1 for families! Also, check out the blog!

https://wingman-jr.blogspot.com/

Other

35 stars 6 forks source link

Investigate ViT Approaches #130

Closed wingman-jr-addon closed 1 year ago

wingman-jr-addon commented 3 years ago

Vision Transformers are gaining steam. However, the smallest size networks tend to be much larger than e.g. MobileNet V2; additionally, the datasets tend to be quite large as well. However, this is an area to watch (in conjunction with unsupervised/semi-supervised approaches) to look for advancements.

wingman-jr-addon commented 3 years ago

Compact Convolutional Transformers looks like an intriguing hybrid approach. Keras even has a nice bite-sized example for it: https://keras.io/examples/vision/cct/

wingman-jr-addon commented 3 years ago

I'd still like to improve the model performance quite a bit, and this seems like a concrete way to start easing into Vision Transformers, so I'm going to label it high to work down the path of evaluating this some more.

wingman-jr-addon commented 1 year ago

So my initial research here leads me to believe that I'd need considerably more data to make this approach work well, and the model sizes haven't been optimized down like they have for convolutional networks to the same degree, so no clear path here.