xiusu / ViTAS

Code for ViTAS_Vision Transformer Architecture Search
51 stars 10 forks source link

The number of head #1

Open December-boy opened 3 years ago

December-boy commented 3 years ago

Thanks for your nice work. I wonder the heads set in the Attention is None, is this means the heads are set to 4 in the supernet? As listed in the paper, the heads are selected in {3, 6, 12, 16}.
image

xiusu commented 3 years ago

Thanks for your issue. To implement the training and search of supernet, we need to set the head number of each batch. Therefore, the first "head_dim" is only used to compute the scale value (self.scale), where the "self.heads = None" is leveraged during the forward process (please refer to core/model/net.py). According to the selected architecture, the value of "self.heads" is changed for each batch data.

December-boy commented 3 years ago

Thanks for you reply

December-boy commented 3 years ago

One more thing, I wonder how to set the selected architecture, since the init supernet is like a block-level search space, each block is the same with 4 heads and 1440 output dimension.

December-boy commented 3 years ago

One more thing, I wonder how to set the selected architecture, since the init supernet is like a block-level search space, each block is the same with 4 heads and 1440 output dimension. Oh, I figure out it.

xiusu commented 3 years ago

Thanks for your question. To implement the retraining process of a searched architecture, you can refer to config/retrain/ViTAS_1G_retrain.yaml. As in lines 82 and 122, the "net_id" defines the retrained architecture with the pre-set search space (78-81 lines). Moreover, you can also use a pre-defined model as the retrained architecture, as in config/retrain/ViTAS_1.3G_retrain.yaml (80-83 lines); with this setting, you can directly train your defined architecture and do not need to use the "net_id" in your yaml.

December-boy commented 3 years ago

Thanks for your question. To implement the retraining process of a searched architecture, you can refer to config/retrain/ViTAS_1G_retrain.yaml. As in lines 82 and 122, the "net_id" defines the retrained architecture with the pre-set search space (78-81 lines). Moreover, you can also use a pre-defined model as the retrained architecture, as in config/retrain/ViTAS_1.3G_retrain.yaml (80-83 lines); with this setting, you can directly train your defined architecture and do not need to use the "net_id" in your yaml.

Thanks! Can you tell me the cost of the search or the whole process? What type of GPU was used, how many GPUs and how many days did it take?

xiusu commented 3 years ago

Thanks for your question. I leveraged 32 X V100 cards with 32G GPU RAM each to implement the search.

xiusu commented 3 years ago

It takes about 2-3 days for searching an ViT architecture.

December-boy commented 3 years ago

It takes about 2-3 days for searching an ViT architecture.

I‘ve trained the supernet, and the sampled results is strange. As shown in the followed figure, the test results is very low. Is that a normal situation? image

xiusu commented 3 years ago

Yes, during sampling, the accuracy of ViT architecture is relatively low in supernet.