GPU idle time analysis - Githubissues

Analyze the idle time of GPUs due to specific design choices in the PyTorch dataloader implementation. Each subtask will be elaborated on going forward

Tasks

Rerun the pipeline with added profiling for getting GPU idle time info - use torch.cuda.Event
Ensure dataset and dataloader shuffling is disabled. - Ensured this by setting shuffle=False for the dataloader, verified it by enumerating the val data loader and comparing the images in each run. The val dataloader does not have randomness in its preprocessing phase and hence serves as a candidate for verification
Map the GPU idle time to p3tracer logs and visualization
Analyze the idle times for varying parameters - num_gpus, num_workers
Analyse and compare hardware, e2e, wait times
Visualise the data and present it appropriately

rajveerb / lotus

GPU idle time analysis #41

Tasks