Logs for training from scratch with open-x datasets

Hi, thanks for your open source code, I'm trying to train the model from scratch with RT-X datasets. I noticed this sentence in the paper, We also experimented with incorporating a few additional datasets into our training mixture that were added to the OpenX dataset since the release of Octo, including the DROID dataset [ 11 ], although at a conservative mixture weight of 10%. In practice, we found that the action token accuracy on DROID remained low throughout training, suggesting a larger mixture weight or model may be required to fit its diversity in the future. To not jeopardize the quality of the final model, we removed DROID from the data mixture for the final third of training. Thus, I think I should select the vla.type as prism-dinosiglip-224px+mx-oxe-magic-soup-plus The configuration are as followed:

torchrun --nnodes 2 --nproc-per-node 8 vla-scripts/train.py \
  --vla.type prism-dinosiglip-224px+mx-oxe-magic-soup-plus
  --data_root_dir /mnt/dolphinfs/hdd_pool/docker/user/RTX

Other than reducing the number of nodes and batchsize proportionally to 512(16*32), I didn't change anything else. However, now that I have trained 8,000 steps, I find that the action token accuray will improve quickly in the beginning, but for a long time, the loss has remained around 1.55, and the average action token accuray has remained around 40%. I don't know if there is any problem in my training process. It will be really kind of you to give me some help, maybe I need to adjust the learning rate and other hyperparameters？Thank you very much.

openvla / openvla

Logs for training from scratch with open-x datasets #65