rajveerb / lotus

Lotus: Characterization of Machine Learning Preprocessing Pipelines via Framework and Hardware Profiling
Other
3 stars 1 forks source link

C++ implementation for image classification pipeline #3

Open rajveerb opened 1 year ago

rajveerb commented 1 year ago

The goal is to convert existing implementation in this file to a C++ implementation.

With the C++ implementation, a finer granularity analysis can be done using profiling/instrumentation tools like Intel VTune for instance. It will also allow us to experiment memory management policies with finer control than in case of python implementation. The dataloader in C++ will allow use to leverage more CPU cores than before.

The-Death-Reaper commented 1 year ago

Completed Tasks

All the completed tasks have been recorded in code/cpp_test under the dev_mayur branch. I will raise a PR once I clean the folder and confirm it can be reproduced in the lab systems.

Pending Tasks:

Look at how to extend the Dataset class for custom datasets and dataloaders. Add functions to do the necessary preprocessing.

1. Crop
2. Rotate
3. Convert to tensor
4. normalize

Convert the Python file to train the model entirely in C++.

The-Death-Reaper commented 1 year ago

Update:

Completed Tasks:

Added functions to do the following preprocessing in C++

Checked and confirmed the correctness of: (Have added files under correctness_checker)

We can now play around with parameters if needed with some degree of confidence that it will match in both languages.

Got a better understanding of how to code a ML model in C++ and to some extent, the features available in libtorch.

Pending Tasks:

(Required for profiling and also will allow us to double-check correctness and maybe comment on the accuracy of models in different libraries and languages)

rajveerb commented 1 year ago

I tried to use the code pushed in dev_mayur_kepler but I am facing issues with building the code.

Can you make make files or add better readme about the setup needed?

I had link issues with dcgan and OpenCV lib missing for example-app

Ability to replicate the same experiments is important for benchmarks. If I cannot replicate it on the same machine then its an issue.

Note: our machine has cuda 11.8 and 12.0 installed. I used the libtorch implementation specific to 11.8 and cuda version 11.8 to build it as well.

Below are the error logs faced while building the code:

For example-app:

environment variables for building:

PATH contains /usr/local/cuda-11.8/bin

LD_LIBRARY_PATH contains /usr/local/cuda-11.8/lib64

$ cmake -DCMAKE_PREFIX_PATH=/home/rbachkaniwala3/work/ml-pipeline-benchmark/code/cpp_test/libtorch/ ..
...
By not providing "FindOpenCV.cmake" in CMAKE_MODULE_PATH this project has
  asked CMake to find a package configuration file provided by "OpenCV", but
  CMake did not find one.

  Could not find a package configuration file provided by "OpenCV" with any
  of the following names:

    OpenCVConfig.cmake
    opencv-config.cmake

  Add the installation prefix of "OpenCV" to CMAKE_PREFIX_PATH or set
  "OpenCV_DIR" to a directory containing one of the above files.  If "OpenCV"
  provides a separate development package or SDK, be sure it has been
  installed.

For dcgan:


$ cmake -DCMAKE_PREFIX_PATH=/home/rbachkaniwala3/work/ml-pipeline-benchmark/code/cpp_test/libtorch/ ..
...
CMake Warning at CMakeLists.txt:18 (add_executable):
  Cannot generate a safe runtime search path for target dcgan because files
  in some directories may conflict with libraries in implicit directories:

    runtime library [libnvToolsExt.so.1] in /usr/lib/x86_64-linux-gnu may be hidden by files in:
      /usr/local/cuda-11.8/lib64

  Some of these libraries may not be found correctly.

$ make
/usr/bin/ld: /lib/x86_64-linux-gnu/libpthread.so.0: error adding symbols: DSO missing from command line
collect2: error: ld returned 1 exit status
make[2]: *** [CMakeFiles/dcgan.dir/build.make:100: dcgan] Error 1
make[1]: *** [CMakeFiles/Makefile2:76: CMakeFiles/dcgan.dir/all] Error 2
make: *** [Makefile:84: all] Error 2
The-Death-Reaper commented 1 year ago

Agreed, I was writing a script for setting up everything but haven't completed and pushed it yet.

With respect to the example-app, it's to do with setting up opencv for C++. I haven't added it in the readme. I will update it.

With respect to dcgan, will check the cmake file and make it more robust. The error could be due to shared library version conflicts.

I will take these up as ad-hoc tasks along with the other things

The-Death-Reaper commented 1 year ago

Update:

Documentation

https://pytorch.org/tutorials/advanced/cpp_export.html

There exist two ways of converting a PyTorch model to Torch Script. The first is known as tracing, a mechanism in which the structure of the model is captured by evaluating it once using example inputs, and recording the flow of those inputs through the model. This is suitable for models that make limited use of control flow.

The second approach is to add explicit annotations to your model that inform the Torch Script compiler that it may directly parse and compile your model code, subject to the constraints imposed by the Torch Script language.

The-Death-Reaper commented 1 year ago

@rajveerb @kexinrong

I read about torchscript tracing and annotations in more detail. Since we are using the standard model and we are not adding any new logic through control statements or loops in the model layers, tracing is the recommended option for using torchscript (as per the documentation and general public opinion). Tracing fails only in the case of control statements or decision loops inside the model layers.

I was thinking of using tracing and transfering the model to C++ and training/testing it there and will get started on it (regardless of annotations or tracing, the next steps remain same anyway). What do you recommend?

rajveerb commented 1 year ago

@The-Death-Reaper

I read about torchscript in the link that you sent before.

Go ahead with torchscript's tracing option.

The-Death-Reaper commented 1 year ago

Update - Oct 4th, 2023

The-Death-Reaper commented 1 year ago

Update - Oct 11th, 2023

rajveerb commented 1 year ago

@The-Death-Reaper

Can you cd into the imagenet dir?

The-Death-Reaper commented 1 year ago

@The-Death-Reaper

Can you cd into the imagenet dir?

No

rajveerb commented 1 year ago

@The-Death-Reaper

Try again

The-Death-Reaper commented 1 year ago

@rajveerb

Working now, ty

The-Death-Reaper commented 1 year ago

Update - Oct 18th, 2023

The-Death-Reaper commented 1 year ago

Tasks- Oct 25th, 2023

Update- Oct 30th, 2023 - @rajveerb

Notes

rajveerb commented 11 months ago

I am trying to run the experiment for cpp on cloudlab. Where are the instructions to setup env? Does your code work for CUDA 11.8? Where are the instructions/commands to run?

Can you push the solution to above question in dev_mayur_kepler soon?

The-Death-Reaper commented 11 months ago

I've pushed the required info to the branch. I'll continue cleaning the directory and adding comments over the next few days. Do keep me updated on the issues faced so I can take those up as well :).

rajveerb commented 11 months ago

How does normalization happen? Is it applied on each image when other transforms happen or when an entire batch is ready?

I am asking this because of below lines of code in this file:

auto train_set = CustomDataset(data.first)
    .map(torch::data::transforms::Normalize<>({0.485, 0.456, 0.406}, {0.229, 0.224, 0.225}))
    .map(torch::data::transforms::Stack<>());
rajveerb commented 11 months ago

@The-Death-Reaper

Can you also add code to pass batch number, GPUs to be used, data loader workers, and other configurable arguments similar to python code in C++ code?

Currently, I have to compile each time.

The-Death-Reaper commented 11 months ago

Normalization

I'm not sure, I will look into this but if you know how it happens in the Python implementation, it is safe to assume the same happens in the C++ version since the backend remains the same.

CustomDataset inherits from Dataset which I assume the Python datasets are mapped to. Nonetheless, I'll update this thread with a confirmation.

Modifications

Sure, I can do that. If you can point me toward a script/file that shows how it should finally look, I can mimic the interface

rajveerb commented 11 months ago

You can look at the arguments in this file.

Some args are not possible, for instance PyTorch profiler args, which you can ignore.

The-Death-Reaper commented 11 months ago

@rajveerb I've pushed a basic modification to read configurations from a config file. Don't forget to copy/modify the config file to/in the build folder. Lmk if this is sufficient.

The exact interface as Python is not mimicked as C++ will need absolute ordering of arguments but if that is preferred over a file I can make that change quickly too. Also, do check if more parameters are needed as well

rajveerb commented 10 months ago

@The-Death-Reaper

I looked at the code. I haven't tested it yet

This uses only a single GPU correct, based on our discussion about Data Parallel in the meetings?

Is there a reason for num_worker to be commented out?

Also, can you please cleanup the scripts to setup env?