Open rajveerb opened 1 year ago
set(CMAKE_CUDA_COMPILER "/usr/local/cuda/bin/nvcc")
to the cmake file - Check the cpp_test folder for example
You should be able to build and compile the example-app.cpp using the CMake commands. Running setup.sh will also build it for you.All the completed tasks have been recorded in code/cpp_test
under the dev_mayur
branch. I will raise a PR once I clean the folder and confirm it can be reproduced in the lab systems.
Look at how to extend the Dataset class for custom datasets and dataloaders. Add functions to do the necessary preprocessing.
1. Crop
2. Rotate
3. Convert to tensor
4. normalize
Convert the Python file to train the model entirely in C++.
Added functions to do the following preprocessing in C++
Checked and confirmed the correctness of: (Have added files under correctness_checker)
We can now play around with parameters if needed with some degree of confidence that it will match in both languages.
Got a better understanding of how to code a ML model in C++ and to some extent, the features available in libtorch.
(Required for profiling and also will allow us to double-check correctness and maybe comment on the accuracy of models in different libraries and languages)
I tried to use the code pushed in dev_mayur_kepler
but I am facing issues with building the code.
Can you make make
files or add better readme about the setup needed?
I had link issues with dcgan
and OpenCV
lib missing for example-app
Ability to replicate the same experiments is important for benchmarks. If I cannot replicate it on the same machine then its an issue.
Note: our machine has cuda 11.8 and 12.0 installed. I used the libtorch implementation specific to 11.8 and cuda version 11.8 to build it as well.
Below are the error logs faced while building the code:
For example-app
:
environment variables for building:
PATH contains /usr/local/cuda-11.8/bin
LD_LIBRARY_PATH contains /usr/local/cuda-11.8/lib64
$ cmake -DCMAKE_PREFIX_PATH=/home/rbachkaniwala3/work/ml-pipeline-benchmark/code/cpp_test/libtorch/ ..
...
By not providing "FindOpenCV.cmake" in CMAKE_MODULE_PATH this project has
asked CMake to find a package configuration file provided by "OpenCV", but
CMake did not find one.
Could not find a package configuration file provided by "OpenCV" with any
of the following names:
OpenCVConfig.cmake
opencv-config.cmake
Add the installation prefix of "OpenCV" to CMAKE_PREFIX_PATH or set
"OpenCV_DIR" to a directory containing one of the above files. If "OpenCV"
provides a separate development package or SDK, be sure it has been
installed.
For dcgan
:
$ cmake -DCMAKE_PREFIX_PATH=/home/rbachkaniwala3/work/ml-pipeline-benchmark/code/cpp_test/libtorch/ ..
...
CMake Warning at CMakeLists.txt:18 (add_executable):
Cannot generate a safe runtime search path for target dcgan because files
in some directories may conflict with libraries in implicit directories:
runtime library [libnvToolsExt.so.1] in /usr/lib/x86_64-linux-gnu may be hidden by files in:
/usr/local/cuda-11.8/lib64
Some of these libraries may not be found correctly.
$ make
/usr/bin/ld: /lib/x86_64-linux-gnu/libpthread.so.0: error adding symbols: DSO missing from command line
collect2: error: ld returned 1 exit status
make[2]: *** [CMakeFiles/dcgan.dir/build.make:100: dcgan] Error 1
make[1]: *** [CMakeFiles/Makefile2:76: CMakeFiles/dcgan.dir/all] Error 2
make: *** [Makefile:84: all] Error 2
Agreed, I was writing a script for setting up everything but haven't completed and pushed it yet.
With respect to the example-app, it's to do with setting up opencv for C++. I haven't added it in the readme. I will update it.
With respect to dcgan, will check the cmake file and make it more robust. The error could be due to shared library version conflicts.
I will take these up as ad-hoc tasks along with the other things
export PATH=/usr/local/cuda-11.8/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda-11.8/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
https://pytorch.org/tutorials/advanced/cpp_export.html
There exist two ways of converting a PyTorch model to Torch Script. The first is known as tracing, a mechanism in which the structure of the model is captured by evaluating it once using example inputs, and recording the flow of those inputs through the model. This is suitable for models that make limited use of control flow.
The second approach is to add explicit annotations to your model that inform the Torch Script compiler that it may directly parse and compile your model code, subject to the constraints imposed by the Torch Script language.
@rajveerb @kexinrong
I read about torchscript tracing and annotations in more detail. Since we are using the standard model and we are not adding any new logic through control statements or loops in the model layers, tracing is the recommended option for using torchscript (as per the documentation and general public opinion). Tracing fails only in the case of control statements or decision loops inside the model layers.
I was thinking of using tracing and transfering the model to C++ and training/testing it there and will get started on it (regardless of annotations or tracing, the next steps remain same anyway). What do you recommend?
@The-Death-Reaper
I read about torchscript in the link that you sent before.
Go ahead with torchscript's tracing option.
@The-Death-Reaper
Can you cd
into the imagenet
dir?
@The-Death-Reaper
Can you
cd
into theimagenet
dir?
No
@The-Death-Reaper
Try again
@rajveerb
Working now, ty
I am trying to run the experiment for cpp on cloudlab. Where are the instructions to setup env? Does your code work for CUDA 11.8? Where are the instructions/commands to run?
Can you push the solution to above question in dev_mayur_kepler
soon?
I've pushed the required info to the branch. I'll continue cleaning the directory and adding comments over the next few days. Do keep me updated on the issues faced so I can take those up as well :).
How does normalization happen? Is it applied on each image when other transforms happen or when an entire batch is ready?
I am asking this because of below lines of code in this file:
auto train_set = CustomDataset(data.first)
.map(torch::data::transforms::Normalize<>({0.485, 0.456, 0.406}, {0.229, 0.224, 0.225}))
.map(torch::data::transforms::Stack<>());
@The-Death-Reaper
Can you also add code to pass batch number, GPUs to be used, data loader workers, and other configurable arguments similar to python code in C++ code?
Currently, I have to compile each time.
I'm not sure, I will look into this but if you know how it happens in the Python implementation, it is safe to assume the same happens in the C++ version since the backend remains the same.
CustomDataset inherits from Dataset which I assume the Python datasets are mapped to. Nonetheless, I'll update this thread with a confirmation.
Sure, I can do that. If you can point me toward a script/file that shows how it should finally look, I can mimic the interface
You can look at the arguments in this file.
Some args are not possible, for instance PyTorch profiler args, which you can ignore.
@rajveerb I've pushed a basic modification to read configurations from a config file. Don't forget to copy/modify the config file to/in the build folder. Lmk if this is sufficient.
The exact interface as Python is not mimicked as C++ will need absolute ordering of arguments but if that is preferred over a file I can make that change quickly too. Also, do check if more parameters are needed as well
@The-Death-Reaper
I looked at the code. I haven't tested it yet
This uses only a single GPU correct, based on our discussion about Data Parallel in the meetings?
Is there a reason for num_worker
to be commented out?
Also, can you please cleanup the scripts to setup env?
The goal is to convert existing implementation in this file to a C++ implementation.
With the C++ implementation, a finer granularity analysis can be done using profiling/instrumentation tools like Intel VTune for instance. It will also allow us to experiment memory management policies with finer control than in case of python implementation. The dataloader in C++ will allow use to leverage more CPU cores than before.