rapidsai / cugraph

cuGraph - RAPIDS Graph Analytics Library
https://docs.rapids.ai/api/cugraph/stable/
Apache License 2.0
1.76k stars 304 forks source link

[QST]: Building cugraph for running SSSP inside `cpp/src/traversal` #4269

Open mrprajesh opened 8 months ago

mrprajesh commented 8 months ago

What is your question?

We are interested in running cpp/single-gpu version of SSSP for comparison as baselines in our paper. So, I tried building cugraph from the instructions

git clone git@github.com:rapidsai/cugraph.git
cd cugraph
 ./build.sh clean
 ./build.sh libcugraph
<snip>
CMake Error at /home/rajesh/install/cmake-3.28.3-linux-x86_64/share/cmake-3.28/Modules/FetchContent.cmake:1679 (message):
  Build step for cugraph-ops failed: 1

I understood that ops is a closed source. So, I even tried from the conda env, which had lincugraphops installed, however, that gave a different error with nccl INCLUDE_DIR vars. Could you please clarify the following?

  1. Is the cpp version usable or buildable at v24.x? or do we have support only for py version?
  2. Can we build cugraph from source via these steps?
  3. Can we run sssp_sg.cu version after installing RAPIDS nightly via conda installation?
  4. Are we on the right lines? Could you please suggest a solution for our objective? Thank a lot in advance.

Our machine config.

Code of Conduct

ChuckHastings commented 8 months ago

There is an option --without_cugraphops which you can add to the build command which will skip over the cugraph ops dependency. That will cause some of the sampling algorithms (which rely on some closed-source cugraph-ops features) to fail. But everything else (including SSSP) will function properly.

So you can try:

./build.sh clean
./build.sh libcugraph --without_cugraphops

and that should do what you want.

mrprajesh commented 7 months ago

Thanks @ChuckHastings, After installing NCCL, I was able to move past the NCCL error. However, my chrome/cinnoman/laptop nearly crashed while spitting more errors (below) during build.

git clone -b v24.04.00 https://github.com/rapidsai/cugraph.git
cd cugraph/
./build.sh clean
./build.sh libcugraph --without_cugraphops

#NCCL Error 
CMake Error at /home/rajesh/install/cmake-3.28.3-linux-x86_64/share/cmake-3.28/Modules/FindPackageHandleStandardArgs.cmake:230 (message):
Could NOT find NCCL (missing: NCCL_LIBRARY NCCL_INCLUDE_DIR)

wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.0-1_all.deb
sudo dpkg -i cuda-keyring_1.0-1_all.deb
sudo apt update
sudo apt install libnccl2 libnccl-dev

./build.sh clean
./build.sh libcugraph --without_cugraphops

[1/632] Building CUDA object CMakeFiles/cugraph.dir/src/community/detail/refine_mg.cu.o
FAILED: CMakeFiles/cugraph.dir/src/community/detail/refine_mg.cu.o 
/usr/local/cuda-12.2/bin/nvcc -forward-unknown-to-host-compiler -DCUDA_API_PER_THREAD_DEFAULT_STREAM -DCUTLASS_NAMESPACE=raft_cutlass -DFMT_HEADER_ONLY=1 -DLIBCUDACXX_ENABLE_EXPERIMENTAL_MEMORY_RESOURCE -DRAFT_COMPILED -DRAFT_SYSTEM_LITTLE_ENDIAN=1 -DSPDLOG_FMT_EXTERNAL -DTHRUST_DEVICE_SYSTEM=THRUST_DEVICE_SYSTEM_CUDA -DTHRUST_DISABLE_ABI_NAMESPACE -DTHRUST_HOST_SYSTEM=THRUST_HOST_SYSTEM_CPP -DTHRUST_IGNORE_ABI_NAMESPACE_ERROR -Dcugraph_EXPORTS -I/home/rajesh/temp/cugraph/cpp/../thirdparty -I/home/rajesh/temp/cugraph/cpp/src -I/home/rajesh/temp/cugraph/cpp/include -I/home/rajesh/temp/cugraph/cpp/build/_deps/rmm-src/include -I/home/rajesh/temp/cugraph/cpp/build/_deps/cccl-src/thrust/thrust/cmake/../.. -I/home/rajesh/temp/cugraph/cpp/build/_deps/cccl-src/libcudacxx/lib/cmake/libcudacxx/../../../include -I/home/rajesh/temp/cugraph/cpp/build/_deps/cccl-src/cub/cub/cmake/../.. -I/home/rajesh/temp/cugraph/cpp/build/_deps/fmt-src/include -I/home/rajesh/temp/cugraph/cpp/build/_deps/spdlog-src/include -I/home/rajesh/temp/cugraph/cpp/build/_deps/raft-src/cpp/include -I/home/rajesh/temp/cugraph/cpp/build/_deps/cuco-src/include -I/home/rajesh/temp/cugraph/cpp/build/_deps/nvidiacutlass-src/include -I/home/rajesh/temp/cugraph/cpp/build/_deps/nvidiacutlass-build/include -I/usr/local/cuda-12.2/include -isystem /usr/local/cuda-12.2/targets/x86_64-linux/include -O3 -DNDEBUG -std=c++17 "--generate-code=arch=compute_86,code=[sm_86]" -Xcompiler=-fPIC --expt-extended-lambda --expt-relaxed-constexpr -Werror=cross-execution-space-call -Wno-deprecated-declarations -Xptxas=--disable-warnings -Xcompiler=-Wall,-Wno-error=sign-compare,-Wno-error=unused-but-set-variable -Xfatbin=-compress-all -DNO_CUGRAPH_OPS -MD -MT CMakeFiles/cugraph.dir/src/community/detail/refine_mg.cu.o -MF CMakeFiles/cugraph.dir/src/community/detail/refine_mg.cu.o.d -x cu -c /home/rajesh/temp/cugraph/cpp/src/community/detail/refine_mg.cu -o CMakeFiles/cugraph.dir/src/community/detail/refine_mg.cu.o
Killed
[2/632] Building CUDA object CMakeFiles/cugraph.dir/src/community/detail/refine_sg.cu.o
FAILED:

Thank you for your patience and your assistance. Kind regards, Rajesh

ChuckHastings commented 7 months ago

Sorry, I didn't fully read your original input, let me answer these first, then I'll answer your most recent question.

I understood that ops is a closed source. So, I even tried from the conda env, which had lincugraphops installed, however, that gave a different error with nccl INCLUDE_DIR vars. Could you please clarify the following?

  1. Is the cpp version usable or buildable at v24.x? or do we have support only for py version?

Yes, each branch should be usable/buildable (cpp or python). 24.02 and 24.04 are released branches and should work fine. 24.06 is the latest code and subject to change, however based on our development/CI process our latest branch should also be buildable unless one of our dependencies has changed and we haven't updated to reflect that change yet.

  1. Can we build cugraph from source via these steps?

Yes, I skipped to this detail of your question in my first answer.

  1. Can we run sssp_sg.cu version after installing RAPIDS nightly via conda installation?

If you are only interested in calling the functions as is and are on a supported architecture, you could install the conda packages. If you install the conda packages, your environment should contain the necessary headers and libraries already compiled for your environment and you wouldn't need to build from source. I would certainly recommend this, building libcugraph takes a bit of time, and unless you're on a system that we don't build for (e.g. using an older GCC or a Pascal or older GPU) there's not much benefit in building the code yourself.

ChuckHastings commented 7 months ago

There's not enough information in your error message for me to suggest what's going wrong. I see the Killed message in your output. If I had to guess (pure speculation on my part), you may have run out of memory.

We have seen issues where some of our .cu files require a large amount of host memory for the compiler to run. It's possible that running this on your notebook computer doesn't have sufficient memory to complete compilation. That would be even more motivation to use the pre-built versions.

mrprajesh commented 7 months ago

Sorry, I didn't fully read your original input,

Sure, No worries. Thank you for your replies.

you may have run out of memory.

Ah, I see.

We have seen issues where some of our .cu files require a large amount of host memory for the compiler to run. It's possible that running this on your notebook computer doesn't have sufficient memory to complete compilation.

OMG! Thanks.

That would be even more motivation to use the pre-built versions.

Sure. I'll attempt this.

I see there are a lot of developments happening in this complex repo/intergrations and due to nx-cugraph All I wished for was to run this BFS example at https://github.com/rapidsai/cugraph/blob/branch-24.06/cpp/examples/users/single_gpu_application/sg_graph_algorithms.cpp It looked very much like gunrock's style of programming so I got interested in checking it out and learning them.

ChuckHastings commented 7 months ago

I think you should be able to build those examples from a conda install of the software. Please let us know if you have any issues, the C++ examples are a new feature we just added in the 24.04 release. Any feedback on making them easier to use would be wonderful.

ChuckHastings commented 6 months ago

Any luck on either running from conda installation or building things on a system with more memory?

mrprajesh commented 6 months ago

Any luck on either running from conda installation or building things on a system with more memory?

Unfortunately, on a system with more memory, we encountered NCCL errors (which we have to compile from src or use sudo). We tried using the Conda-installed version (back then, before the 24.04 release) but encountered similar roadblocks. // I'll have to check with the release version.

Any feedback on making them easier to use would be wonderful.

It would be nice to have a lite build system, for example, separating single GPU code vs multi GPU code. i.e. minimal dependency on required -I files than building the whole of cugraph

On build from source

It would be nice if the prerequisite section lists about NCCL, cugraphops, etc.

Thank you for all your help and patience. Kind regards,

ChuckHastings commented 6 months ago

A thought to try.

We have segregated the SG and MG implementations for many of the algorithms into separate source files. The implementation is generally in a common header file, but the instantiation of the actual functions occurs in separate source files. While we don't have an easy way to skip building the MG code, you could try going into CMakeLists.txt and commenting out the compilation of all of the source files that have an _mg suffix (e.g. src/community/louvain_mg.cu). You'd have to also do that in the tests/CMakeLists.txt.

That might work, or if you combine that with commenting out the references to NCCL in the two CMakeLists.txt files you might get a functioning build.

ChuckHastings commented 4 months ago

Any luck on this?

If you are using the latest branch (our 24.08 development branch) you will see that we split many of the files into smaller translation units to make the compilation require less memory.