pytorch / vision

Datasets, Transforms and Models specific to Computer Vision
https://pytorch.org/vision
BSD 3-Clause "New" or "Revised" License
16.05k stars 6.93k forks source link

[RFC] New Ops in TorchVision #5414

Open datumbox opened 2 years ago

datumbox commented 2 years ago

🚀 The feature

Consider adding the following operators in TorchVision:

Layers

There is a separate ticket for tracking common layers: #4333

Operators

Losses

There is a separate ticket tracking Losses proposals: #2980

Schedulers & Optimizers (Core upstreaming)

xiaohu2015 commented 2 years ago

maybe I can implement some features, eg DropBlock layer.

datumbox commented 2 years ago

@xiaohu2015 wow, I literally just slacked you to see if you are interested 😄

Great! If you want to send a PR for DropBlock it would be awesome. Let me know if you want me to create an issue for it so that other contributors know you are working on it already (else you can create one yourself or just rely on the PR; up to you!).

lezwon commented 2 years ago

hey @datumbox, can I take up SoftNMS implementation?

datumbox commented 2 years ago

@lezwon Thanks for offering help!

The SoftNMS would have to be implemented in C++ and CUDA because this is where we implement the standard NMS. Some additional discussion would be required to see exactly how this will be implemented and what its API would look like. As you understand this is quite a lot of work and it's not guaranteed that the feature will be merged. If you are up for it, we can discuss more. Just wanted to give you a heads up that this is a more risky feature to work on.

If the above doesn't sound too appealing, there are features listed at #5410 you might find fun to work on. Have a look and let me know if anything interests you. :)

lezwon commented 2 years ago

@datumbox Sure thing :) I'll pick up something from #5410

oke-aditya commented 2 years ago

I want to try DropConnect Layer. Any other info / implementation I could look to will be great :smiley:

datumbox commented 2 years ago

@oke-aditya There are a few reasons we haven't added DropConnect. According to the paper, here is Dropout: r = m * a(W v) and here is DropConnect: r = a ((M * W) u).

a: activation u: input M: bernoulli mask W: weights

As you see the M on the latter case is applied on the W, which means it makes for an awkward design of a layer. I believe you will have to implement different versions of it for Linear and Convs. Another issue with it is that it's quite old and not often used in SOTA research. These are some of the reasons we decided not to add it, at least on phase 1 and 2 of Batteries Included.