rajveerb / lotus

Lotus: Characterization of Machine Learning Preprocessing Pipelines via Framework and Hardware Profiling
Other
1 stars 0 forks source link

Parser for PyTorch profiler #10

Closed rajveerb closed 11 months ago

rajveerb commented 11 months ago

The json file produced by the PyTorch profiler needs to be parsed to get the information about forward pass + backward pass + preprocessing.

The parser is specific to our python code for image classification pipeline.

The parser will give the average time spent in forward + backward pass based on CPU process events in the json file.

One interesting observation is that the the time span related to dataloader is not exactly the elapsed time spent performing the preprocessing operation rather it is the majority of the elapsed time spent waiting for the worker threads to finish so that it can proceed further. This has to be accounted for and the actual preprocessing elapsed time spent by the worker process needs to be instrumented. But, for this issue the focus is the parser for PyTorch profiler.