Open oke-aditya opened 2 years ago
@oke-aditya Thanks for the proposal. Here are some thoughts:
Concerning Large, we would have to reproduce the training of the model. We didn't do it because it would take time and resources but we can definitely do it on the future when we get a bit more bandwidth.
Concerning SwinMLP, I'm a bit unsure how popular this variant is. @YosuaMichael @jdsgomes I was hoping to get your input on whether any internal production or external research teams have requested for the specific variant?
Finally concerning SwinMoe, the paper is quite new and has only 2 citations at the time of writing. We should definitely keep an eye on it in case it picks up steam.
Can we add the swin_l
model configs and model without the weights? Or is it now a convention to first fully reproduce the model and then add it?
For a very long time, we allowed models without weights. In the last release @YosuaMichael trained the last remaining variants. Models with no weights used to create issues to various CI jobs or users who tried to initialize them by name, so often we would find snippets of code where users were trying to exclude them from the lists. That was more true back when the pretrained=True
idiom was used prominently. To cut the long story short, nothing forbids to have a model with no weights but it's not great user experience. It also adds load on our CI because it will try to automatically run tests on it and since it's so massive, it will slow down the execution (or throw memory errors). This is why you see me being reluctant to add it if we don't offer weights. For those users who want to use it, it's easy to do using the SwinTransformer
model class.
Perhaps a middle ground is the following. Given we fully reproduced the accuracy on the other variants, if there is demand from the community, we could add the model with ported weights from the research repo. If we then verify we get the same accuracy, we should be good to go. WDYT?
So I gave little bit thought of how we should handle when we can't add pretrained weights. Below are few points.
Well if you see how far we have come from alexNet in 10 years. Probably models are gonna grow in GBs than in MBs. So, someday or the other we are gonna hit the zenith, when we actually need to add big models, either with weights or without.
Agreed. Of course.
Well, but the pretrained=True default idiom has gone I guess? And of course default for pretrained was False. But I still don't get what the exact issue is and is it valid given our MultiWeight API support context.
Are our tests and CI different for models with weights and without weights? In talks with @YosuaMichael we too did discuss if we could just run a separate CI job that only tests large models. Or somehow we can split the model tests by marking them using pytest.mark and run tests with medium size on aggregate.
E.g. in a CI machine we run tests for Alexnet and Swin Large. This overall creates moderate use of GPUs. And in other machine we could have resnets. (Something like Load Balancing the models)
I think some or the other day we will need a solution for it, considering this is first time we faced this, this might just become frequent. E.g with Swin3D and so on.
SwinMLP
sorry I missed this earlier. I Haven't heard of internal use cases for the SwinMLP.
Perhaps a middle ground is the following. Given we fully reproduced the accuracy on the other variants, if there is demand from the community, we could add the model with ported weights from the research repo. If we then verify we get the same accuracy, we should be good to go. WDYT?
Sounds great. How about verifying the L size weights provided by the authors microsoft/Swin-Transformer? Can we add this to the todolist?
🚀 The feature
The original paper describes a few more configurations based on swin Transformer.
Motivation, pitch
I think that Swin Large and SwinMLP could be good candidates as they need few edits for implementation.
I'm not sure if we can port weights, or train from scratch. As adding weights and implementation would also add a CI job and maintaining it.
Alternatives
No response
Additional context
No response
cc @datumbox