Is it possible to utilize Distributed Training and/or Parallel Sampling with the library?

worldbank / REaLTabFormer

A suite of auto-regressive and Seq2Seq (sequence-to-sequence) transformer models for tabular and relational synthetic data generation.

https://worldbank.github.io/REaLTabFormer/

MIT License

203 stars 23 forks source link

Is it possible to utilize Distributed Training and/or Parallel Sampling with the library? #15

Closed echatzikyriakidis closed 1 year ago

echatzikyriakidis commented 1 year ago

Hi @avsolatorio,

I would like to know if it is possible to distribute the training on environment with multiple GPUs or even to multiple machines?

Also, is it possible to parallelizing the sampling operation with the library?

Currently, I have a single environment with one GPU and I run the training on Google Colab.

Thanks.

avsolatorio commented 1 year ago

Hello @echatzikyriakidis, yes, the training process leverages HuggingFace's trainer allowing the use of multiple GPUs natively. I have yet to try training this across a cluster of machines.

As for the sampling process, no. This is one of the main bottlenecks of autoregressive models. There is a sequential dependency in the generative process, so it's impossible to parallelize this with the existing model.

echatzikyriakidis commented 1 year ago

Hi @avsolatorio!

Thank you again for this.

It is very helpful that distributed training can be used in multi-GPU environments. I will try it for sure. Also, if you have any feedback on this sooner please let me know!

Regarding sampling I understand that the model is autoregressive and to generate a single example,, every token is feed to the input in order to generate the next one and that a decoding strategy is used.

Parallelization for sampling I meant if it possible to generate examples using multiple GPU cards, by distributing the generation of 64 examples in GPU1 and some other batch of 64 in GPU2, etc.

Maybe I could use model1.sample(device="cuda1") and model2.sample(device="cuda2") in two different parallel threads? The only problem could be that at the end when I will merge the synthetic examples, duplicates could exist.

avsolatorio commented 1 year ago

@echatzikyriakidis, it is possible to perform "parallel" inference using independent GPUs provided that you set different random seeds prior to sampling. The generation is stochastic, so as long as the random states are not similar, you would expect to have independent samples. Even when duplicates exist, it will be due to random processes and should not be systematic.

echatzikyriakidis commented 1 year ago

Thank you for the clarification!

echatzikyriakidis commented 1 year ago

Regarding utilizing multiple GPU cards in a single.VM in training I can verify that it works since HF's Trainer is used. I have tested it in a GCP computer engine VM with 2 NVIDIA T4 cards.