vimalabs / VIMA

Official Algorithm Implementation of ICML'23 Paper "VIMA: General Robot Manipulation with Multimodal Prompts"
MIT License
778 stars 87 forks source link

Request for Training Code and Clarification on Parallelization & Batch Size #36

Closed nisshimura closed 1 year ago

nisshimura commented 1 year ago

Hello, I have a couple of questions about the project:

Training Code: Is it possible for the training code to be released? It would greatly help in understanding the implementation details and for reproducing the results.

Parallelization & Batch Size: When training, does parallelizing episodes across multiple GPUs equate to setting the batch size? I would appreciate some clarification on how parallelization and batch size are related in the context of this project.

Thank you for your time and consideration. Looking forward to your response.

yunfanjiang commented 1 year ago

Hi there, thank you for your interest in our project. Regarding training, this code snippet illustrates the logic of one training iteration. Regarding parallelization and batch size, since gradients computed on every GPU are then synchronized, the effective batch size would be the batch size on a singe GPU multiplied by the number of parallel GPUs. To be specific, to train the largest model, we set an effective batch size of 128, which amounts to a local batch size of 16 across 8 GPUs.