Open pablovela5620 opened 3 years ago
one loads up the entire dataset into system ram
this is not the case
Understood, so I had a chance to try and train the model using the provided config. I'm using a machine with 128gb of ram and 2 A6000 gpus.
When I run on a single gpu using
python tools/train.py configs/hand3d/InterNet/interhand3d/res50_interhand3d_all_256x256.py
it uses up about 30GB of ram to load and train the network, the reason I assumed it loaded the entire dataset into system ram is the large amount of ram when using distributed training.
after tools/dist_train.sh
I have the following problem.
Using the provided config with dist_train
and only changing num gpus and num workers
So with this testing, I had the following questions
I really appreciate the help!
@zengwang430521 Could you please check this issue?
Hi @pablovela5620. We load all annotation into memory before training, and this will cost a lot of memory. So if you find memory insufficient, you can use less workers. And it's grateful that our implementation may not be suitable for 30-fps version now, because tit's too massive.
@zengwang430521 The implementation could be improved.
@zengwang430521 so with the current implementation it seems like there are basically two solutions if using distributed single node training
I did notice that using distributed training with 1 gpu vs normal training with 1 gpu results in higher ram usage (68gb vs ~30gb). Not totally sure as to why. Some clarity here would be appreciated.
Also how much ram did the 8 gpu 2 worker machine use when training on the interhand3d dataset?
If I was to modify the dataset implementation (so that I could get it working with 30FPS version), it seems like its more of a design decision over the whole of mmpose hand datasets. I may be completely wrong here, and please correct me if I am, the use of xtcocotools in HandBaseDataset
from xtcocotools.coco import COCO
self.coco = COCO(ann_file)
self.img_ids = self.coco.getImgIds()
basically loads the entire annotation into memory for any dataset that depends on it, also looking at Interhand2D/Interhand3D and others when calling def _get_db()
with open(self.camera_file, 'r') as f:
cameras = json.load(f)
with open(self.joint_file, 'r') as f:
joints = json.load(f)
is what is eating up all the system memory inside the gt_db
object. This seems consistent with all other datasets as well of first loading the entire dataset and then running the augmentation/preprocessing pipelines
So rather than loading the entire dataset, I would have to overload def __getitem__(self, idx):
to load the dataset on each call rather than all at once? Does this make sense or are there some other considerations I should be looking at and downsides of not loading all at once
Looking at the log provided it looks like 8 Titan x gpus were used to train the interhand dataset with a batch size of 16 and 2 workers per gpu.
The full interhand dataset is pretty massive (<1 million images) and my understanding is that per worker and gpu one loads up the entire dataset into system ram (not gpu vram) so even with lets say a 128gb 8 gpus *2 workers = a HUGE amount of system ram. Am I understanding this correctly? I haven't had a chance to test yet
How much system ram did the machine that was used to train have? It seems super difficult to try to retrain on a multi GPU system without a really significant amount of system ram (>256gb?).