rajveerb / lotus

Lotus: Characterization of Machine Learning Preprocessing Pipelines via Framework and Hardware Profiling
Other
3 stars 1 forks source link

GPU idle time analysis #41

Open The-Death-Reaper opened 7 months ago

The-Death-Reaper commented 7 months ago

Analyze the idle time of GPUs due to specific design choices in the PyTorch dataloader implementation. Each subtask will be elaborated on going forward

Tasks

  1. Rerun the pipeline with added profiling for getting GPU idle time info - use torch.cuda.Event

  2. Ensure dataset and dataloader shuffling is disabled. - Ensured this by setting shuffle=False for the dataloader, verified it by enumerating the val data loader and comparing the images in each run. The val dataloader does not have randomness in its preprocessing phase and hence serves as a candidate for verification

  3. Map the GPU idle time to p3tracer logs and visualization

  4. Analyze the idle times for varying parameters - num_gpus, num_workers

  5. Analyse and compare hardware, e2e, wait times

  6. Visualise the data and present it appropriately