Rt1-VIT-bert - Githubissues

Changed EfficientNet to ViT and Integrated BERT Text Encoder for `rt1`

Description

This update replaces the existing EfficientNet model with a Vision Transformer (ViT) and integrates a BERT text encoder for rt1. The current implementation includes a dummy sample to test the network functionality. The training part is not included in this update.

Summary:

Replaced EfficientNet with Vision Transformer (ViT).
Integrated BERT as the text encoder.
Implemented a dummy sample to test the network functionality.
The training part is not included in this update.

How Has This Been Tested?

The changes have been verified by running a dummy sample to ensure the network functionality works as expected.

Testing:

Ran a dummy sample to verify network functionality.

Checklist

[ ] Self-review
[ ] Documentation
[ ] Testing

mbodiai / embodied-agents

Rt1-VIT-bert #14

Changed EfficientNet to ViT and Integrated BERT Text Encoder for `rt1`

Description

How Has This Been Tested?

Checklist

mbodiai / embodied-agents

Rt1-VIT-bert #14

Changed EfficientNet to ViT and Integrated BERT Text Encoder for rt1

Description

How Has This Been Tested?

Checklist

Changed EfficientNet to ViT and Integrated BERT Text Encoder for `rt1`