why do I still need two days to train with RTX3090 ?

liuruijin17 / LSTR

This is an official repository of End-to-end Lane Shape Prediction with Transformers.

BSD 3-Clause "New" or "Revised" License

644 stars 130 forks source link

why do I still need two days to train with RTX3090 ? #58

Open qklee-lz opened 3 years ago

qklee-lz commented 3 years ago

Hi~ Thank you for your work, I have some questions. Q1. I see your code have "parallel", but why not support more gpus training ? Q2. what means of "chunk_sizes": [16] in file LSTR.json ?

voldemortX commented 3 years ago

@qklee-lz A little suggestion here: LSTR being a very small model, multiple GPU probably won't help you at all. In my experience, a large number of dataloader workers with a SSD stored dataset could help you more than a RTX3090.

In fact, given good I/O, I can launch 2 LSTR training in a single 2080Ti without slowing down either.

liuruijin17 commented 2 years ago

Hi, @qklee-lz thanks for your interest! A1. I have fixed the multi-GPU codes and will release them. Q2. chunk_sizes: [16] means: use one GPU, put 16 images on it. After we release the multi-GPU codes if you use two GPU and set chunk_sizes:[8,8], then each image will be given 8 images.

@voldemortX Thanks for your experiment results. Your suggestion is correct, but since many friends want multi-GPU training and our codes about it is a bug, I will fix and release some updates.