rajveerb / lotus

Lotus: Characterization of Machine Learning Preprocessing Pipelines via Framework and Hardware Profiling
Other
1 stars 1 forks source link

Preprocessing operation span logging and visualization using PyTorch profiler #19

Closed rajveerb closed 10 months ago

rajveerb commented 1 year ago

The goal is to log the preprocessing operations' span by instrumenting PyTorch/PyTorch vision's source code and convert the collected log to a format that PyTorch profiler produces for its own data.

Span for each event includes the following data:

  1. Timestamp corresponding to wall clock
  2. Duration of the event
  3. Process id
  4. Thread id

In PyTorch profiler's case, process id and thread id are the same.

A script is needed to merge the converted logs to merge with PyTorch profiler json file.

Above will allow us to visualize the preprocessing operations performed by the DataLoader workers just like the PyTorch profiler visualization.

rajveerb commented 1 year ago

I have completed:

  1. adding logging to source code
  2. script to combine the PyTorch profiler data.
rajveerb commented 1 year ago

The pytorch profiler logs are too large even without our added augmentation logic.

I am creating a file compressor using LZMA library.

rajveerb commented 1 year ago

Finished #21 which will allow me to push the compressed files.