Changed EfficientNet to ViT and Integrated BERT Text Encoder for rt1
Description
This update replaces the existing EfficientNet model with a Vision Transformer (ViT) and integrates a BERT text encoder for rt1. The current implementation includes a dummy sample to test the network functionality. The training part is not included in this update.
Summary:
Replaced EfficientNet with Vision Transformer (ViT).
Integrated BERT as the text encoder.
Implemented a dummy sample to test the network functionality.
The training part is not included in this update.
How Has This Been Tested?
The changes have been verified by running a dummy sample to ensure the network functionality works as expected.
Testing:
Ran a dummy sample to verify network functionality.
Changed EfficientNet to ViT and Integrated BERT Text Encoder for
rt1
Description
This update replaces the existing EfficientNet model with a Vision Transformer (ViT) and integrates a BERT text encoder for
rt1
. The current implementation includes a dummy sample to test the network functionality. The training part is not included in this update.Summary:
How Has This Been Tested?
The changes have been verified by running a dummy sample to ensure the network functionality works as expected.
Testing:
Checklist