Open gaowayne opened 1 month ago
Please check the version of /usr/bin/nvcc
. Its version should match the nvcc version you used to compile BaM.
Please check the version of
/usr/bin/nvcc
. Its version should match the nvcc version you used to compile BaM.
yes, I confirmed it matched.
root@salab-hpedl380g11-01:~/wayne/gids/GIDS/gids_module/build# nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Thu_Sep_12_02:18:05_PDT_2024
Cuda compilation tools, release 12.6, V12.6.77
Build cuda_12.6.r12.6/compiler.34841621_0
root@salab-hpedl380g11-01:~/wayne/gids/GIDS/gids_module/build# nvidia-smi
Sun Oct 20 04:00:31 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 560.35.03 Driver Version: 560.35.03 CUDA Version: 12.6 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA L40S Off | 00000000:8A:00.0 Off | 0 |
| N/A 29C P8 32W / 350W | 23MiB / 46068MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| 0 N/A N/A 2499 G /usr/lib/xorg/Xorg 4MiB |
| 0 N/A N/A 3542 G /usr/lib/xorg/Xorg 4MiB |
+-----------------------------------------------------------------------------------------+
root@salab-hpedl380g11-01:~/wayne/gids/GIDS/gids_module/build#
Please execute which nvcc
to see if the output is /usr/bin/nvcc
. Alternatively, run /usr/bin/nvcc -V
to verify the version of nvcc
that is actually being called during the cmake process.
Please execute
which nvcc
to see if the output is/usr/bin/nvcc
. Alternatively, run/usr/bin/nvcc -V
to verify the version ofnvcc
that is actually being called during the cmake process.
man, you are quite correct :) what is best way to fix this?
root@salab-hpedl380g11-01:~/wayne/gids/GIDS/gids_module/build# which nvcc
/usr/local/cuda/bin/nvcc
root@salab-hpedl380g11-01:~/wayne/gids/GIDS/gids_module/build# /usr/bin/nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Sun_Jul_28_19:07:16_PDT_2019
Cuda compilation tools, release 10.1, V10.1.243
root@salab-hpedl380g11-01:~/wayne/gids/GIDS/gids_module/build#
@WWWzq-01 buddy, I manually copy 12.6 nvcc into usr/bin, now this CUDA error gone. I got GCC version error. I can build BAM with GCC-9 that is default to ubuntu20.04.3
root@salab-hpedl380g11-01:~/wayne/gids/GIDS/gids_module/build# cmake ..
-- The CXX compiler identification is GNU 9.4.0
-- The CUDA compiler identification is unknown
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Check for working CUDA compiler: /usr/bin/nvcc
-- Check for working CUDA compiler: /usr/bin/nvcc -- broken
CMake Error at /usr/share/cmake-3.16/Modules/CMakeTestCUDACompiler.cmake:46 (message):
The CUDA compiler
"/usr/bin/nvcc"
is not able to compile a simple test program.
It fails with the following output:
Change Dir: /root/wayne/gids/GIDS/gids_module/build/CMakeFiles/CMakeTmp
Run Build Command(s):/usr/bin/make cmTC_740a5/fast && /usr/bin/make -f CMakeFiles/cmTC_740a5.dir/build.make CMakeFiles/cmTC_740a5.dir/build
make[1]: Entering directory '/root/wayne/gids/GIDS/gids_module/build/CMakeFiles/CMakeTmp'
Building CUDA object CMakeFiles/cmTC_740a5.dir/main.cu.o
/usr/bin/nvcc -x cu -c /root/wayne/gids/GIDS/gids_module/build/CMakeFiles/CMakeTmp/main.cu -o CMakeFiles/cmTC_740a5.dir/main.cu.o
In file included from /usr/include/cuda_runtime.h:83,
from <command-line>:
/usr/include/crt/host_config.h:138:2: error: #error -- unsupported GNU version! gcc versions later than 8 are not supported!
138 | #error -- unsupported GNU version! gcc versions later than 8 are not supported!
| ^~~~~
make[1]: *** [CMakeFiles/cmTC_740a5.dir/build.make:66: CMakeFiles/cmTC_740a5.dir/main.cu.o] Error 1
make[1]: Leaving directory '/root/wayne/gids/GIDS/gids_module/build/CMakeFiles/CMakeTmp'
make: *** [Makefile:121: cmTC_740a5/fast] Error 2
CMake will not be able to correctly generate this project.
Call Stack (most recent call first):
CMakeLists.txt:2 (PROJECT)
-- Configuring incomplete, errors occurred!
See also "/root/wayne/gids/GIDS/gids_module/build/CMakeFiles/CMakeOutput.log".
See also "/root/wayne/gids/GIDS/gids_module/build/CMakeFiles/CMakeError.log".
root@salab-hpedl380g11-01:~/wayne/gids/GIDS/gids_module/build#
@WWWzq-01 I fixed the include headers by copying cuda 12.6 into usr/include, last error gone. I met new error below
cicc not found.
root@salab-hpedl380g11-01:~/wayne/gids/GIDS/gids_module/build# cmake .. -DPYTHON_INCLUDE_DIR=$(python -c "import sysconfig; print(sysconfig.get_path('include'))") -DPYTHON_LIBRARY=$(python -c "import sysconfig; print(sysconfig.get_config_var('LIBDIR'))")
Command 'python' not found, did you mean:
command 'python3' from deb python3
command 'python' from deb python-is-python3
Command 'python' not found, did you mean:
command 'python3' from deb python3
command 'python' from deb python-is-python3
-- The CUDA compiler identification is unknown
-- Check for working CUDA compiler: /usr/bin/nvcc
-- Check for working CUDA compiler: /usr/bin/nvcc -- broken
CMake Error at /usr/share/cmake-3.16/Modules/CMakeTestCUDACompiler.cmake:46 (message):
The CUDA compiler
"/usr/bin/nvcc"
is not able to compile a simple test program.
It fails with the following output:
Change Dir: /root/wayne/gids/GIDS/gids_module/build/CMakeFiles/CMakeTmp
Run Build Command(s):/usr/bin/make cmTC_66f66/fast && /usr/bin/make -f CMakeFiles/cmTC_66f66.dir/build.make CMakeFiles/cmTC_66f66.dir/build
make[1]: Entering directory '/root/wayne/gids/GIDS/gids_module/build/CMakeFiles/CMakeTmp'
Building CUDA object CMakeFiles/cmTC_66f66.dir/main.cu.o
/usr/bin/nvcc -x cu -c /root/wayne/gids/GIDS/gids_module/build/CMakeFiles/CMakeTmp/main.cu -o CMakeFiles/cmTC_66f66.dir/main.cu.o
sh: 1: cicc: not found
make[1]: *** [CMakeFiles/cmTC_66f66.dir/build.make:66: CMakeFiles/cmTC_66f66.dir/main.cu.o] Error 127
make[1]: Leaving directory '/root/wayne/gids/GIDS/gids_module/build/CMakeFiles/CMakeTmp'
make: *** [Makefile:121: cmTC_66f66/fast] Error 2
CMake will not be able to correctly generate this project.
Call Stack (most recent call first):
CMakeLists.txt:2 (PROJECT)
-- Configuring incomplete, errors occurred!
I have fixed cicc tool path problem. now I saw the link error below
root@salab-hpedl380g11-01:~/wayne/gids/GIDS/gids_module/build# cmake .. -DPYTHON_INCLUDE_DIR=$(python -c "import sysconfig; print(sysconfig.get_path('include'))") -DPYTHON_LIBRARY=$(python -c "import sysconfig; print(sysconfig.get_config_var('LIBDIR'))")
Command 'python' not found, did you mean:
command 'python3' from deb python3
command 'python' from deb python-is-python3
Command 'python' not found, did you mean:
command 'python3' from deb python3
command 'python' from deb python-is-python3
-- The CXX compiler identification is GNU 9.4.0
-- The CUDA compiler identification is unknown
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Check for working CUDA compiler: /usr/bin/nvcc
-- Check for working CUDA compiler: /usr/bin/nvcc -- broken
CMake Error at /usr/share/cmake-3.16/Modules/CMakeTestCUDACompiler.cmake:46 (message):
The CUDA compiler
"/usr/bin/nvcc"
is not able to compile a simple test program.
It fails with the following output:
Change Dir: /root/wayne/gids/GIDS/gids_module/build/CMakeFiles/CMakeTmp
Run Build Command(s):/usr/bin/make cmTC_23d39/fast && /usr/bin/make -f CMakeFiles/cmTC_23d39.dir/build.make CMakeFiles/cmTC_23d39.dir/build
make[1]: Entering directory '/root/wayne/gids/GIDS/gids_module/build/CMakeFiles/CMakeTmp'
Building CUDA object CMakeFiles/cmTC_23d39.dir/main.cu.o
/usr/bin/nvcc -x cu -c /root/wayne/gids/GIDS/gids_module/build/CMakeFiles/CMakeTmp/main.cu -o CMakeFiles/cmTC_23d39.dir/main.cu.o
Linking CUDA executable cmTC_23d39
/usr/bin/cmake -E cmake_link_script CMakeFiles/cmTC_23d39.dir/link.txt --verbose=1
"" CMakeFiles/cmTC_23d39.dir/main.cu.o -o cmTC_23d39
Error running link command: No such file or directory
make[1]: *** [CMakeFiles/cmTC_23d39.dir/build.make:87: cmTC_23d39] Error 2
make[1]: Leaving directory '/root/wayne/gids/GIDS/gids_module/build/CMakeFiles/CMakeTmp'
make: *** [Makefile:121: cmTC_23d39/fast] Error 2
CMake will not be able to correctly generate this project.
Call Stack (most recent call first):
CMakeLists.txt:2 (PROJECT)
-- Configuring incomplete, errors occurred!
See also "/root/wayne/gids/GIDS/gids_module/build/CMakeFiles/CMakeOutput.log".
See also "/root/wayne/gids/GIDS/gids_module/build/CMakeFiles/CMakeError.log".
I fixed the linker problem by copy the bin/crt link.stub. here is new error
failed to parsed CUDA nvcc implicit link information,
Failed to parsed CUDA nvcc implicit link information:
#$ _THERE_=/usr/bin
#$ _TARGET_SIZE_=
#$ _TARGET_DIR_=
#$ _TARGET_SIZE_=64
#$ rm tmp/a_dlink.reg.c
Failed to parsed CUDA nvcc implicit link information:
#$ _THERE_=/usr/bin
#$ _TARGET_SIZE_=
#$ _TARGET_DIR_=
#$ _TARGET_SIZE_=64
#$ rm tmp/a_dlink.reg.c
CMake will not be able to correctly generate this project.
Call Stack (most recent call first):
CMakeLists.txt:2 (PROJECT)
Failed to parsed CUDA nvcc implicit link information:
Failed to parsed CUDA nvcc implicit link information:
#$ _THERE_=/usr/bin
#$ _TARGET_SIZE_=
#$ _TARGET_DIR_=
#$ _TARGET_SIZE_=64
#$ rm tmp/a_dlink.reg.c
Failed to parsed CUDA nvcc implicit link information:
Failed to parsed CUDA nvcc implicit link information:
#$ _THERE_=/usr/bin
#$ _TARGET_SIZE_=
#$ _TARGET_DIR_=
#$ _TARGET_SIZE_=64
#$ rm tmp/a_dlink.reg.c
#$ _CUDART_=cudart
#$ _HERE_=/usr/bin
#$ _THERE_=/usr/bin
#$ _TARGET_SIZE_=
#$ _TARGET_DIR_=
#$ _TARGET_SIZE_=64
#$ rm tmp/a_dlink.reg.c
#$ gcc -D__CUDA_ARCH_LIST__=520 -D__NV_LEGACY_LAUNCH -E -x c++ -D__CUDACC__ -D__NVCC__ -D__CUDACC_VER_MAJOR__=12 -D__CUDACC_VER_MINOR__=6 -D__CUDACC_VER_BUILD__=77 -D__CUDA_API_VER_MAJOR__=12 -D__CUDA_API_VER_MINOR__=6 -D__NVCC_DIAG_PRAGMA_SUPPORT__=1 -include "cuda_runtime.h" -m64 "CMakeCUDACompilerId.cu" -o "tmp/CMakeCUDACompilerId.cpp4.ii"
#$ cudafe++ --c++14 --gnu_version=90400 --display_error_number --orig_src_file_name "CMakeCUDACompilerId.cu" --orig_src_path_name "/root/wayne/gids/GIDS/gids_module/build/CMakeFiles/3.16.3/CompilerIdCUDA/CMakeCUDACompilerId.cu" --allow_managed --m64 --parse_templates --gen_c_file_name "tmp/CMakeCUDACompilerId.cudafe1.cpp" --stub_file_name "CMakeCUDACompilerId.cudafe1.stub.c" --gen_module_id_file --module_id_file_name "tmp/CMakeCUDACompilerId.module_id" "tmp/CMakeCUDACompilerId.cpp4.ii"
#$ gcc -D__CUDA_ARCH__=520 -D__CUDA_ARCH_LIST__=520 -D__NV_LEGACY_LAUNCH -E -x c++ -DCUDA_DOUBLE_MATH_FUNCTIONS -D__CUDACC__ -D__NVCC__ -D__CUDACC_VER_MAJOR__=12 -D__CUDACC_VER_MINOR__=6 -D__CUDACC_VER_BUILD__=77 -D__CUDA_API_VER_MAJOR__=12 -D__CUDA_API_VER_MINOR__=6 -D__NVCC_DIAG_PRAGMA_SUPPORT__=1 -include "cuda_runtime.h" -m64 "CMakeCUDACompilerId.cu" -o "tmp/CMakeCUDACompilerId.cpp1.ii"
#$ cicc --c++14 --gnu_version=90400 --display_error_number --orig_src_file_name "CMakeCUDACompilerId.cu" --orig_src_path_name "/root/wayne/gids/GIDS/gids_module/build/CMakeFiles/3.16.3/CompilerIdCUDA/CMakeCUDACompilerId.cu" --allow_managed -arch compute_52 -m64 --no-version-ident -ftz=0 -prec_div=1 -prec_sqrt=1 -fmad=1 --include_file_name "CMakeCUDACompilerId.fatbin.c" -tused --module_id_file_name "tmp/CMakeCUDACompilerId.module_id" --gen_c_file_name "tmp/CMakeCUDACompilerId.cudafe1.c" --stub_file_name "tmp/CMakeCUDACompilerId.cudafe1.stub.c" --gen_device_file_name "tmp/CMakeCUDACompilerId.cudafe1.gpu" "tmp/CMakeCUDACompilerId.cpp1.ii" -o "tmp/CMakeCUDACompilerId.ptx"
#$ ptxas -arch=sm_52 -m64 "tmp/CMakeCUDACompilerId.ptx" -o "tmp/CMakeCUDACompilerId.sm_52.cubin"
#$ fatbinary --create="tmp/CMakeCUDACompilerId.fatbin" -64 --cicc-cmdline="-ftz=0 -prec_div=1 -prec_sqrt=1 -fmad=1 " "--image3=kind=elf,sm=52,file=tmp/CMakeCUDACompilerId.sm_52.cubin" "--image3=kind=ptx,sm=52,file=tmp/CMakeCUDACompilerId.ptx" --embedded-fatbin="tmp/CMakeCUDACompilerId.fatbin.c"
#$ gcc -D__CUDA_ARCH__=520 -D__CUDA_ARCH_LIST__=520 -D__NV_LEGACY_LAUNCH -c -x c++ -DCUDA_DOUBLE_MATH_FUNCTIONS -Wno-psabi -m64 "tmp/CMakeCUDACompilerId.cudafe1.cpp" -o "tmp/CMakeCUDACompilerId.o"
#$ nvlink -m64 --arch=sm_52 --register-link-binaries="tmp/a_dlink.reg.c" -cpu-arch=X86_64 "tmp/CMakeCUDACompilerId.o" -lcudadevrt -o "tmp/a_dlink.sm_52.cubin" --host-ccbin "gcc"
#$ fatbinary --create="tmp/a_dlink.fatbin" -64 --cicc-cmdline="-ftz=0 -prec_div=1 -prec_sqrt=1 -fmad=1 " -link "--image3=kind=elf,sm=52,file=tmp/a_dlink.sm_52.cubin" --embedded-fatbin="tmp/a_dlink.fatbin.c"
#$ gcc -D__CUDA_ARCH_LIST__=520 -D__NV_LEGACY_LAUNCH -c -x c++ -DFATBINFILE="\"tmp/a_dlink.fatbin.c\"" -DREGISTERLINKBINARYFILE="\"tmp/a_dlink.reg.c\"" -I. -D__NV_EXTRA_INITIALIZATION= -D__NV_EXTRA_FINALIZATION= -D__CUDA_INCLUDE_COMPILER_INTERNAL_HEADERS__ -Wno-psabi -D__CUDACC_VER_MAJOR__=12 -D__CUDACC_VER_MINOR__=6 -D__CUDACC_VER_BUILD__=77 -D__CUDA_API_VER_MAJOR__=12 -D__CUDA_API_VER_MINOR__=6 -D__NVCC_DIAG_PRAGMA_SUPPORT__=1 -m64 "/usr/bin/crt/link.stub" -o "tmp/a_dlink.o"
#$ g++ -D__CUDA_ARCH_LIST__=520 -D__NV_LEGACY_LAUNCH -m64 -Wl,--start-group "tmp/a_dlink.o" "tmp/CMakeCUDACompilerId.o" -lcudadevrt -lcudart_static -lrt -lpthread -ldl -Wl,--end-group -o "a.out"
enable cmake trace, we can see this error, need help!~
/snap/cmake/1417/share/cmake-3.30/Modules/Internal/CMakeNVCCParseImplicitInfo.cmake(130): else()
Called from: [3] /snap/cmake/1417/share/cmake-3.30/Modules/Internal/CMakeNVCCParseImplicitInfo.cmake
[2] /snap/cmake/1417/share/cmake-3.30/Modules/CMakeDetermineCUDACompiler.cmake
[1] /root/wayne/gids/GIDS/gids_module/CMakeLists.txt
/snap/cmake/1417/share/cmake-3.30/Modules/Internal/CMakeNVCCParseImplicitInfo.cmake(131): message(CONFIGURE_LOG Failed to parse CUDA nvcc implicit link information:\n${_nvcc_log}\n\n )
Called from: [3] /snap/cmake/1417/share/cmake-3.30/Modules/Internal/CMakeNVCCParseImplicitInfo.cmake
[2] /snap/cmake/1417/share/cmake-3.30/Modules/CMakeDetermineCUDACompiler.cmake
[1] /root/wayne/gids/GIDS/gids_module/CMakeLists.txt
/snap/cmake/1417/share/cmake-3.30/Modules/Internal/CMakeNVCCParseImplicitInfo.cmake(133): message(FATAL_ERROR Failed to extract nvcc implicit link line. )
Called from: [3] /snap/cmake/1417/share/cmake-3.30/Modules/Internal/CMakeNVCCParseImplicitInfo.cmake
[2] /snap/cmake/1417/share/cmake-3.30/Modules/CMakeDetermineCUDACompiler.cmake
[1] /root/wayne/gids/GIDS/gids_module/CMakeLists.txt
CMake Error at /snap/cmake/1417/share/cmake-3.30/Modules/Internal/CMakeNVCCParseImplicitInfo.cmake:133 (message):
Failed to extract nvcc implicit link line.
Call Stack (most recent call first):
/snap/cmake/1417/share/cmake-3.30/Modules/CMakeDetermineCUDACompiler.cmake:242 (cmake_nvcc_parse_implicit_info)
CMakeLists.txt:2 (PROJECT)
Called from: [3] /snap/cmake/1417/share/cmake-3.30/Modules/Internal/CMakeNVCCParseImplicitInfo.cmake
[2] /snap/cmake/1417/share/cmake-3.30/Modules/CMakeDetermineCUDACompiler.cmake
[1] /root/wayne/gids/GIDS/gids_module/CMakeLists.txt
-- Configuring incomplete, errors occurred!
You can specify the CUDA version directly during the CMake process with the following command:
cmake -DCUDAToolkit_ROOT=/usr/local/cuda-12.6 -DCMAKE_CUDA_COMPILER=/usr/local/cuda-12.6/bin/nvcc
This command sets the path to the desired CUDA toolkit and nvcc
compiler, ensuring that CMake uses the specified version during the build process.
If you still encounter GCC errors, you can specify the GCC version directly using the following command:
cmake -DCMAKE_C_COMPILER=/usr/bin/gcc-9 -DCMAKE_CXX_COMPILER=/usr/bin/g++-9 -DCUDAToolkit_ROOT=/usr/local/cuda-12.6 -DCMAKE_CUDA_COMPILER=/usr/local/cuda-12.6/bin/nvcc
Here, replace /usr/bin/gcc-9
with the desired GCC version. To find the default GCC path, you can use the following command:
ll $(which gcc)
This will display the path of the default gcc
version, which you can then use in your CMake command.
cmake -DCUDAToolkit_ROOT=/usr/local/cuda-12.6 -DCMAKE_CUDA_COMPILER=/usr/local/cuda-12.6/bin/nvcc
hello buddy, your 1st method works great now, I only see python related error. :)
it is very promising now, can you shed more light on this? :)
root@salab-hpedl380g11-01:~/wayne/gids/GIDS/gids_module/build# cmake -DCUDAToolkit_ROOT=/usr/local/cuda-12.6 -DCMAKE_CUDA_COMPILER=/usr/local/cuda-12.6/bin/nvcc .. \
> -DPYTHON_INCLUDE_DIR=$(python -c "import sysconfig; print(sysconfig.get_path('include'))") \
> -DPYTHON_LIBRARY=$(python -c "import sysconfig; print(sysconfig.get_config_var('LIBDIR'))")
Command 'python' not found, did you mean:
command 'python3' from deb python3
command 'python' from deb python-is-python3
Command 'python' not found, did you mean:
command 'python3' from deb python3
command 'python' from deb python-is-python3
-- The CXX compiler identification is GNU 9.4.0
-- The CUDA compiler identification is NVIDIA 12.6.77
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Check for working CUDA compiler: /usr/local/cuda-12.6/bin/nvcc
-- Check for working CUDA compiler: /usr/local/cuda-12.6/bin/nvcc -- works
-- Detecting CUDA compiler ABI info
-- Detecting CUDA compiler ABI info - done
CMake Error at CMakeLists.txt:29 (FIND_PACKAGE):
By not providing "Findpybind11.cmake" in CMAKE_MODULE_PATH this project has
asked CMake to find a package configuration file provided by "pybind11",
but CMake did not find one.
Could not find a package configuration file provided by "pybind11" with any
of the following names:
pybind11Config.cmake
pybind11-config.cmake
Add the installation prefix of "pybind11" to CMAKE_PREFIX_PATH or set
"pybind11_DIR" to a directory containing one of the above files. If
"pybind11" provides a separate development package or SDK, be sure it has
been installed.
-- Configuring incomplete, errors occurred!
See also "/root/wayne/gids/GIDS/gids_module/build/CMakeFiles/CMake
this is latest error
root@salab-hpedl380g11-01:~/wayne/gids/GIDS/gids_module/build# cmake -DCUDAToolkit_ROOT=/usr/local/cuda-12.6 -DCMAKE_CUDA_COMPILER=/usr/local/cuda-12.6/bin/nvcc .. -DPYTHON_INCLUDE_DIR=$(python -c "import sysconfig; print(sysconfig.get_path('include'))") -DPYTHON_LIBRARY=$(python -c "import sysconfig; print(sysconfig.get_config_var('LIBDIR'))")
Command 'python' not found, did you mean:
command 'python3' from deb python3
command 'python' from deb python-is-python3
Command 'python' not found, did you mean:
command 'python3' from deb python3
command 'python' from deb python-is-python3
-- Found PythonInterp: /usr/bin/python3.8 (found version "3.8.10")
-- Found PythonInterp: /usr/bin/python3.8 (found suitable version "3.8.10", minimum required is "3")
CMake Error at /usr/share/cmake-3.16/Modules/FindPackageHandleStandardArgs.cmake:146 (message):
Could NOT find PythonLibs (missing: PYTHON_INCLUDE_DIRS) (Required is at
least version "3")
Call Stack (most recent call first):
/usr/share/cmake-3.16/Modules/FindPackageHandleStandardArgs.cmake:393 (_FPHSA_FAILURE_MESSAGE)
/usr/share/cmake-3.16/Modules/FindPythonLibs.cmake:310 (FIND_PACKAGE_HANDLE_STANDARD_ARGS)
CMakeLists.txt:45 (FIND_PACKAGE)
-- Configuring incomplete, errors occurred!
See also "/root/wayne/gids/GIDS/gids_module/build/CMakeFiles/CMakeOutput.log".
root@salab-hpedl380g11-01:~/wayne/gids/GIDS/gids_module/build#
To make the python
command point to python3
, you can create a symbolic link as follows:
sudo ln -s /usr/bin/python3 /usr/bin/python
Or, you can modify the command to use python3
instead of python
.
@WWWzq-01 all are great. now I am trying to run UT. still got some depends error in python under ubuntu20.04
any good idea on this libssl.so.3?
root@salab-hpedl380g11-01:~/wayne/gids/GIDS/evaluation# ./gids_unit_test.sh
/usr/local/lib/python3.8/dist-packages/scipy/__init__.py:143: UserWarning: A NumPy version >=1.19.5 and <1.27.0 is required for this version of SciPy (detected version 1.17.4)
warnings.warn(f"A NumPy version >={np_minversion} and <{np_maxversion}"
Traceback (most recent call last):
File "GIDS_unit_test.py", line 2, in <module>
import dgl
File "/usr/local/lib/python3.8/dist-packages/dgl/__init__.py", line 16, in <module>
from . import (
File "/usr/local/lib/python3.8/dist-packages/dgl/dataloading/__init__.py", line 13, in <module>
from .dataloader import *
File "/usr/local/lib/python3.8/dist-packages/dgl/dataloading/dataloader.py", line 27, in <module>
from ..distributed import DistGraph
File "/usr/local/lib/python3.8/dist-packages/dgl/distributed/__init__.py", line 5, in <module>
from .dist_graph import DistGraph, DistGraphServer, edge_split, node_split
File "/usr/local/lib/python3.8/dist-packages/dgl/distributed/dist_graph.py", line 11, in <module>
from .. import backend as F, graphbolt as gb, heterograph_index
File "/usr/local/lib/python3.8/dist-packages/dgl/graphbolt/__init__.py", line 8, in <module>
from .base import *
File "/usr/local/lib/python3.8/dist-packages/dgl/graphbolt/base.py", line 8, in <module>
from torchdata.datapipes.iter import IterDataPipe
File "/usr/local/lib/python3.8/dist-packages/torchdata/datapipes/__init__.py", line 9, in <module>
from torchdata import _extension # noqa: F401
File "/usr/local/lib/python3.8/dist-packages/torchdata/__init__.py", line 29, in __getattr__
return importlib.import_module("." + name, __name__)
File "/usr/lib/python3.8/importlib/__init__.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "/usr/local/lib/python3.8/dist-packages/torchdata/_extension.py", line 34, in <module>
_init_extension()
File "/usr/local/lib/python3.8/dist-packages/torchdata/_extension.py", line 31, in _init_extension
from torchdata import _torchdata as _torchdata
File "/usr/local/lib/python3.8/dist-packages/torchdata/__init__.py", line 29, in __getattr__
return importlib.import_module("." + name, __name__)
File "/usr/lib/python3.8/importlib/__init__.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
ImportError: libssl.so.3: cannot open shared object file: No such file or directory
root@salab-hpedl380g11-01:~/wayne/gids/GIDS/evaluation#
@WWWzq-01 I already fix the libssl issue by add LD path. I tried, but this one is really hard. could you please check?
root@salab-hpedl380g11-01:~/wayne/gids/GIDS/evaluation# ./gids_unit_test.sh
/usr/local/lib/python3.8/dist-packages/torchdata/datapipes/__init__.py:18: UserWarning:
################################################################################
WARNING!
The 'datapipes', 'dataloader2' modules are deprecated and will be removed in a
to learn more and leave feedback.
################################################################################
deprecation_warning()
Traceback (most recent call last):
File "GIDS_unit_test.py", line 2, in <module>
import dgl
File "/usr/local/lib/python3.8/dist-packages/dgl/__init__.py", line 16, in <module>
from . import (
File "/usr/local/lib/python3.8/dist-packages/dgl/dataloading/__init__.py", line 13, in <module>
from .dataloader import *
File "/usr/local/lib/python3.8/dist-packages/dgl/dataloading/dataloader.py", line 27, in <module>
from ..distributed import DistGraph
File "/usr/local/lib/python3.8/dist-packages/dgl/distributed/__init__.py", line 5, in <module>
from .dist_graph import DistGraph, DistGraphServer, edge_split, node_split
File "/usr/local/lib/python3.8/dist-packages/dgl/distributed/dist_graph.py", line 11, in <module>
from .. import backend as F, graphbolt as gb, heterograph_index
File "/usr/local/lib/python3.8/dist-packages/dgl/graphbolt/__init__.py", line 55, in <module>
load_graphbolt()
File "/usr/local/lib/python3.8/dist-packages/dgl/graphbolt/__init__.py", line 45, in load_graphbolt
raise FileNotFoundError(
FileNotFoundError: Cannot find DGL C++ graphbolt library at /usr/local/lib/python3.8/dist-packages/dgl/graphbolt/libgraphbolt_pytorch_2.4.1.so
root@salab-hpedl380g11-01:~/wayne/gids/GIDS/evaluation# ./gids_unit_test.sh
Traceback (most recent call last):
File "GIDS_unit_test.py", line 2, in <module>
import dgl
ModuleNotFoundError: No module named 'dgl'
root@salab-hpedl380g11-01:~/wayne/gids/GIDS/evaluation# pip show dgl
WARNING: Package(s) not found: dgl
root@salab-hpedl380g11-01:~/wayne/gids/GIDS/evaluation# pip install dgl-cu126
ERROR: Could not find a version that satisfies the requirement dgl-cu126 (from versions: none)
ERROR: No matching distribution found for dgl-cu126
There is probably no CUDA 12.6 version of DGL. You can refer to this documentation to install DGL and try installing the 12.4 version instead.
There is probably no CUDA 12.6 version of DGL. You can refer to this documentation to install DGL and try installing the 12.4 version instead.
I install 12.4 CUDA DGL python package with above link. it is done. it get below error.
in the below path, there is no this file /usr/local/lib/python3.8/dist-packages/dgl/graphbolt/libgraphbolt_pytorch_2.4.0.so.
root@salab-hpedl380g11-01:/usr/local/lib/python3.8/dist-packages/dgl/graphbolt# ls -l
total 537620
-rw-r--r-- 1 root staff 16274 Oct 20 13:05 base.py
-rw-r--r-- 1 root staff 7013 Oct 20 13:05 dataloader.py
drwxr-sr-x 3 root staff 4096 Oct 20 13:05 datapipes
-rw-r--r-- 1 root staff 2751 Oct 20 13:05 dataset.py
-rw-r--r-- 1 root staff 5831 Oct 20 13:05 external_utils.py
-rw-r--r-- 1 root staff 9672 Oct 20 13:05 feature_fetcher.py
-rw-r--r-- 1 root staff 10578 Oct 20 13:05 feature_store.py
drwxr-sr-x 3 root staff 4096 Oct 20 13:05 impl
-rw-r--r-- 1 root staff 4181 Oct 20 13:05 __init__.py
drwxr-sr-x 3 root staff 4096 Oct 20 13:05 internal
-rw-r--r-- 1 root staff 12142 Oct 20 13:05 internal_utils.py
-rw-r--r-- 1 root staff 24537 Oct 20 13:05 item_sampler.py
-rw-r--r-- 1 root staff 16115 Oct 20 13:05 itemset.py
-rwxr-xr-x 1 root staff 275158000 Oct 20 13:05 libgraphbolt_pytorch_2.3.0.so
-rwxr-xr-x 1 root staff 275158000 Oct 20 13:05 libgraphbolt_pytorch_2.3.1.so
-rw-r--r-- 1 root staff 15590 Oct 20 13:05 minibatch.py
-rw-r--r-- 1 root staff 1109 Oct 20 13:05 minibatch_transformer.py
-rw-r--r-- 1 root staff 3292 Oct 20 13:05 negative_sampler.py
drwxr-sr-x 2 root staff 4096 Oct 20 13:05 __pycache__
-rw-r--r-- 1 root staff 16693 Oct 20 13:05 sampled_subgraph.py
-rw-r--r-- 1 root staff 2295 Oct 20 13:05 sampling_graph.py
-rw-r--r-- 1 root staff 10783 Oct 20 13:05 subgraph_sampler.py
root@salab-hpedl380g11-01:/usr/local/lib/python3.8/dist-packages/dgl/graphbolt#
here is error log
Requirement already satisfied: pydantic-core==2.23.4 in /usr/local/lib/python3.8/dist-packages (from pydantic>=2.0->dgl) (2.23.4)
Requirement already satisfied: annotated-types>=0.6.0 in /usr/local/lib/python3.8/dist-packages (from pydantic>=2.0->dgl) (0.7.0)
Requirement already satisfied: six>=1.5 in /usr/lib/python3/dist-packages (from python-dateutil>=2.8.2->pandas->dgl) (1.14.0)
Requirement already satisfied: mpmath<1.4,>=1.1.0 in /usr/local/lib/python3.8/dist-packages (from sympy->torch<=2.4.0->dgl) (1.3.0)
Requirement already satisfied: nvidia-nvjitlink-cu12 in /usr/local/lib/python3.8/dist-packages (from nvidia-cusolver-cu12==11.4.5.107; platform_system == "Linux" and platform_machine == "x86_64"->torch<=2.4.0->dgl) (12.4.99)
root@salab-hpedl380g11-01:~/wayne/gids/GIDS/evaluation# ./gids_unit_test.sh
Traceback (most recent call last):
File "GIDS_unit_test.py", line 16, in <module>
import GIDS
File "/usr/local/lib/python3.8/dist-packages/GIDS/__init__.py", line 2, in <module>
from .GIDS import GIDS
File "/usr/local/lib/python3.8/dist-packages/GIDS/GIDS.py", line 24, in <module>
from dgl.distributed import DistGraph
File "/usr/local/lib/python3.8/dist-packages/dgl/distributed/__init__.py", line 10, in <module>
from .dist_graph import DistGraph, DistGraphServer, edge_split, node_split
File "/usr/local/lib/python3.8/dist-packages/dgl/distributed/dist_graph.py", line 12, in <module>
from .. import backend as F, graphbolt as gb, heterograph_index
File "/usr/local/lib/python3.8/dist-packages/dgl/graphbolt/__init__.py", line 81, in <module>
load_graphbolt()
File "/usr/local/lib/python3.8/dist-packages/dgl/graphbolt/__init__.py", line 66, in load_graphbolt
raise FileNotFoundError(
FileNotFoundError: Unable to locate the DGL C++ GraphBolt library at /usr/local/lib/python3.8/dist-packages/dgl/graphbolt/libgraphbolt_pytorch_2.4.0.so. This error typically occurs due to a version mismatch between the installed DGL and the PyTorch version you are currently using. Please ensure that your DGL installation is compatible with your PyTorch version. For more information, refer to the installation guide at https://www.dgl.ai/pages/start.html.
root@salab-hpedl380g11-01:~/wayne/gids/GIDS/evaluation#
Did you match the corresponding PyTorch version when installing DGL?You can use the following command to display the list of installed packages using pip:
pip list
Did you match the corresponding PyTorch version when installing DGL?You can use the following command to display the list of installed packages using pip:
pip list
thank you so much man for your great help!~~
root@salab-hpedl380g11-01:~/wayne/gids/GIDS/evaluation# pip list
Package Version
------------------------ --------------------
annotated-types 0.7.0
apturl 0.5.2
attrs 19.3.0
Automat 0.8.0
bcrypt 3.1.7
blinker 1.4
Brlapi 0.7.0
certifi 2019.11.28
chardet 3.0.4
Click 7.0
cloud-init 24.3.1
cmake-cpp-pybind11 0.1.1
colorama 0.4.3
command-not-found 0.3
configobj 5.0.6
constantly 15.1.0
cryptography 2.8
cupshelpers 1.0
louis 3.12.0
macaroonbakery 1.3.1
Mako 1.1.0
MarkupSafe 1.1.0
monotonic 1.5
more-itertools 4.2.0
mpmath 1.3.0
netifaces 0.10.4
networkx 3.1
numpy 1.24.4
nvidia-cublas-cu12 12.4.2.65
nvidia-cuda-cupti-cu12 12.4.99
nvidia-cuda-nvrtc-cu12 12.4.99
nvidia-cuda-runtime-cu12 12.4.99
nvidia-cudnn-cu12 9.1.0.70
nvidia-cufft-cu12 11.2.0.44
nvidia-curand-cu12 10.3.5.119
nvidia-cusolver-cu12 11.6.0.99
nvidia-cusparse-cu12 12.3.0.142
nvidia-nccl-cu12 2.20.5
nvidia-nvjitlink-cu12 12.4.99
nvidia-nvtx-cu12 12.4.99
nvtx 0.2.10
oauthlib 3.1.0
olefile 0.46
packaging 24.1
pandas 2.0.3
paramiko 2.6.0
pexpect 4.6.0
Pillow 7.0.0
pip 20.0.2
protobuf 3.6.1
psutil 6.1.0
pyasn1 0.4.2
pyasn1-modules 0.2.1
requests 2.22.0
requests-unixsocket 0.2.0
scikit-learn 1.3.2
scipy 1.10.1
screen-resolution-extra 0.0.0
SecretStorage 2.3.1
service-identity 18.1.0
setuptools 45.2.0
simplejson 3.16.0
six 1.14.0
sos 4.5.6
ssh-import-id 5.10
sympy 1.13.3
systemd-python 234
threadpoolctl 3.5.0
torch 2.4.1+cu124
torchaudio 2.4.1+cu124
torchdata 0.8.0
torchvision 0.19.1+cu124
tqdm 4.66.5
triton 3.0.0
Twisted 18.9.0
typing-extensions 4.12.2
tzdata 2024.2
ubuntu-drivers-common 0.0.0
ubuntu-pro-client 8001
ufw 0.36
unattended-upgrades 0.1
urllib3 1.25.8
usb-creator 0.3.7
wadllib 1.3.3
wheel 0.34.2
xkit 0.0.0
zipp 1.0.0
zope.interface 4.7.1
Did you install DGL using pip? It doesn’t show the DGL package here.
Did you match the corresponding PyTorch version when installing DGL?
Did you install DGL using pip? It doesn’t show the DGL package here.
Did you match the corresponding PyTorch version when installing DGL?
I install both pytorch and DGL with pip.
root@salab-hpedl380g11-01:~/wayne/gids/GIDS/evaluation# pip list | grep dgl
dgl 2.4.0+cu121
root@salab-hpedl380g11-01:~/wayne/gids/GIDS/evaluation# pip list | grep pytorch
root@salab-hpedl380g11-01:~/wayne/gids/GIDS/evaluation# pip list | grep torch
torch 2.4.1+cu124
torchaudio 2.4.1+cu124
torchdata 0.8.0
torchvision 0.19.1+cu124
root@salab-hpedl380g11-01:~/wayne/gids/GIDS/evaluation#
try to install dgl with cuda-12.4?
try to install dgl with cuda-12.4?
thank you so much man. I uninstall dgl then install again with below. now the unit test can start, there is runtime error, but depends looks well. :)
437 pip uninstall dgl
438 pip install dgl -f https://data.dgl.ai/wheels/torch-2.4/cu124/repo.html
@gaowayne what did I educate in BaM? Never create a hack for dependency. Please follow standard version matching mechanism.
@jeongminpark417 can you create a docker file that manages these dependency automatically. I believe users should not manually do these things. This is broken approach and will fail.
@msharmavikram @WWWzq-01 @jeongminpark417 thank you all so much. it is working from my side, now I think I just need change some hard code dataset path to run through it. :)
I am downloading the full dataset.
root@salab-hpedl380g11-01:~/wayne/gids/GIDS/evaluation# ./test1.sh
GIDS DataLoader Setting
GIDS: True
CPU Feature Buffer: True
Window Buffering: True
Storage Access Accumulator: True
Dataset: IGB
Traceback (most recent call last):
File "heterogeneous_train.py", line 282, in <module>
dataset = IGBHeteroDGLDatasetMassive(args)
File "/root/wayne/gids/GIDS/evaluation/dataloader.py", line 377, in __init__
super().__init__(name='IGB260M')
File "/usr/local/lib/python3.8/dist-packages/dgl/data/dgl_dataset.py", line 112, in __init__
self._load()
File "/usr/local/lib/python3.8/dist-packages/dgl/data/dgl_dataset.py", line 203, in _load
self.process()
File "/root/wayne/gids/GIDS/evaluation/dataloader.py", line 381, in process
paper_paper_edges = torch.from_numpy(np.load(osp.join(self.dir, self.args.dataset_size, 'processed',
File "/usr/local/lib/python3.8/dist-packages/numpy/lib/npyio.py", line 405, in load
fid = stack.enter_context(open(os_fspath(file), "rb"))
FileNotFoundError: [Errno 2] No such file or directory: '/mnt/nvme1n1/full/processed/paper__cites__paper/edge_index.npy'
@WWWzq-01 buddy
where I can get this pr_full.pt file?
root@salab-hpedl380g11-01:~/wayne/gids/GIDS/evaluation# ./test1.sh
GIDS DataLoader Setting
GIDS: True
CPU Feature Buffer: True
Window Buffering: True
Storage Access Accumulator: True
Dataset: IGB
SSD are not assigned
ssd list: None
SSD index: 0
SQs: 255 CQs: 255 n_qps: 128
Ctrl sizes: 1
n pages: 1048576
page size: 4096
num elements: 563200000000
n_ranges_bits: 6
n_ranges_mask: 63
pages_dma: 0x7fb238010000 220020410000
HEREN
Cond1
100000 8 1 100000
Finish Making Page Cache
Number of required storage accesses: 854.0499999999993
Traceback (most recent call last):
File "heterogeneous_train.py", line 312, in <module>
track_acc_GIDS(g, category, args, device, labels, key_offset)
File "heterogeneous_train.py", line 68, in track_acc_GIDS
pr_ten = torch.load(args.pin_file)
File "/usr/local/lib/python3.8/dist-packages/torch/serialization.py", line 1065, in load
with _open_file_like(f, 'rb') as opened_file:
File "/usr/local/lib/python3.8/dist-packages/torch/serialization.py", line 468, in _open_file_like
return _open_file(name_or_buffer, mode)
File "/usr/local/lib/python3.8/dist-packages/torch/serialization.py", line 449, in __init__
super().__init__(open(name, mode))
FileNotFoundError: [Errno 2] No such file or directory: '/mnt/nvme1n1/pr_full.pt'
root@salab-hpedl380g11-01:~/wayne/gids/GIDS/evaluation#
hello, I found I can use page_rank_node_list_gen.py to create pr_full.pt test. then I run out below result
I have two questions:
root@salab-hpedl380g11-01:~/wayne/gids/GIDS/evaluation# ./test1.sh
GIDS DataLoader Setting
GIDS: True
CPU Feature Buffer: True
Window Buffering: True
Storage Access Accumulator: True
Dataset: IGB
SSD are not assigned
ssd list: None
SSD index: 0
SQs: 255 CQs: 255 n_qps: 128
Ctrl sizes: 1
n pages: 1048576
page size: 4096
num elements: 563200000000
n_ranges_bits: 6
n_ranges_mask: 63
pages_dma: 0x7ef768010000 220020410000
HEREN
Cond1
100000 8 1 100000
Finish Making Page Cache
Number of required storage accesses: 854.0499999999993
0%| | 0/1 [00:00<?, ?it/s]
warp up done
GIDS time: 35.0621292591095
WB time: 0.11368942260742188
print stats:
print array reset: #READ IOs: 0 #Accesses:1318947840 #Misses:1024407136 Miss Rate:0.776685 #Hits: 294540704 Hit Rate:0.223315 CLSize:4096 Debug Cnt: 0
*********************************
print ctrl reset 0: ------------------------------------
#SSDAccesses: 32012723
Kernel Time: 28573.6
Total Access: 175142339
Performance for 100 iteration after 1000 iteration
GIDS time: 3.439724922180176
WB time: 0.011293411254882812
print stats:
print array reset: #READ IOs: 0 #Accesses:115118784 #Misses:85275584 Miss Rate:0.740762 #Hits: 29843200 Hit Rate:0.259238 CLSize:4096 Debug Cnt: 0
*********************************
print ctrl reset 0: ------------------------------------
#SSDAccesses: 2664862
Kernel Time: 2847.87
Total Access: 17468327
transfer time: 0.04842567443847656
train time: 0.7668819427490234
e2e time: 4.265716314315796
0%| | 0/1 [00:47<?, ?it/s
hello, also where I can get below dataset extend files.
elif self.size == 'large' or self.size == 'full':
num_nodes = self.num_nodes()
if self.num_classes == 19:
_**path = '/mnt/nvme16/IGB260M_part_2/full/processed/paper/node_label_19_extended.npy'**_
if(self.in_memory):
node_labels = np.memmap(path, dtype='float32', mode='r', shape=(num_nodes)).copy()
else:
node_labels = np.memmap(path, dtype='float32', mode='r', shape=(num_nodes))
# Actual number 227130858
else:
**_path = '/mnt/nvme16/IGB260M_part_2/full/processed/paper/node_label_2K_extended.npy'_**
if(self.in_memory):
node_labels = np.load(path)
else:
node_labels = np.memmap(path, dtype='float32', mode='r', shape=(num_nodes))
Hi @gaowayne, sorry for late response. The dataset can be downloade from IGB dataset https://github.com/IllinoisGraphBenchmark/IGB-Datasets. The feature aggregation time is the Kernel Time (ms).
It currently does not directly show BaM bandwidth and IOPs, but you can simply calculate that with the number of accesses and the kernel time
Hi @gaowayne, sorry for late response. The dataset can be downloade from IGB dataset https://github.com/IllinoisGraphBenchmark/IGB-Datasets. The feature aggregation time is the Kernel Time (ms).
It currently does not directly show BaM bandwidth and IOPs, but you can simply calculate that with the number of accesses and the kernel time
for BW and IOPS, I saw there is code BaM.fsstat, maybe we can calc it there?
thank you so much. how about sampling time? :) and also, I just download from this IGB-Datasets. download full package from bash scripts as mentioned. but it does not contain any extend files as our GIDS dataloader.py are looking for in above code. node_label_19_extended.npy ----> current full package only have no extended file. could you please shed some light? why current code looking for extended file.
root@salab-hpedl380g11-01:/mnt/nvme1n1/full/processed# tree
.
├── paper
│ ├── node_feat.npy
│ ├── node_label_19.npy
│ ├── node_label_2K.npy
│ └── paper_id_index_mapping.npy
└── paper__cites__paper
└── edge_index.npy
guys, I also do not have below files.
# self.graph = dgl.graph((node_edges[:, 0],node_edges[:, 1]), num_nodes=node_features.shape[0])
if self.args.dataset_size == 'full':
edge_row_idx = torch.from_numpy(np.load(cur_path + '/paper__cites__paper/edge_index_csc_row_idx.npy'))
edge_col_idx = torch.from_numpy(np.load(cur_path + '/paper__cites__paper/edge_index_csc_col_idx.npy'))
edge_idx = torch.from_numpy(np.load(cur_path + '/paper__cites__paper/edge_index_csc_edge_idx.npy'))
self.graph = dgl.graph(('csc', (edge_col_idx,edge_row_idx,edge_idx)), num_nodes=node_features.shape[0])
self.graph = self.graph.formats('csc')
else:
guys, I found another way to try. I will ./download_igbh600m.sh then try to run IGBHeteroDGLDatasetMassive, looks like all match. :)
The sampling time should be almost identical if you subtract the feature aggression time and training time to epoch time. The more accurate sampling time can be measured by timing next(it) in GIDS.py file.
The sampling time should be almost identical if you subtract the feature aggression time and training time to epoch time. The more accurate sampling time can be measured by timing next(it) in GIDS.py file.
thank you so much man.
Kernel Time: 2848.38 -----------> this is featuring aggregation time at ms. Total Access: 17512908 transfer time: 0.04809975624084473 train time: 0.7011768817901611 --------> e2e time: 4.2234179973602295
may I know what is epoch time? :) sampling = epoch - kernel - train. I guess this is correct.
also, ./download_igbh600m.sh is very very big, I am downloading them take 1-2 day. but ./download_igbh260m.sh missing the files to run the full mode. :( can you shed some light on this too?
100000 8 1 100000
Finish Making Page Cache
Number of required storage accesses: 854.0499999999993
0%| | 0/1 [00:00<?, ?it/s]warp up done
GIDS time: 35.200151681900024
WB time: 0.11409378051757812
print stats:
print array reset: #READ IOs: 0 #Accesses:1319791808 #Misses:1025061280 Miss Rate:0.776684 #Hits: 294730528 Hit Rate:0.223316 CLSize:4096 Debug Cnt: 0
*********************************
print ctrl reset 0: ------------------------------------
#SSDAccesses: 32033165
Kernel Time: 28529.1
Total Access: 175257719
Performance for 100 iteration after 1000 iteration
GIDS time: 3.4651525020599365
WB time: 0.011265754699707031
print stats:
print array reset: #READ IOs: 0 #Accesses:115226464 #Misses:85321312 Miss Rate:0.740466 #Hits: 29905152 Hit Rate:0.259534 CLSize:4096 Debug Cnt: 0
*********************************
print ctrl reset 0: ------------------------------------
#SSDAccesses: 2666291
Kernel Time: 2848.38
Total Access: 17512908
transfer time: 0.04809975624084473
train time: 0.7011768817901611
e2e time: 4.2234179973602295
@jeongminpark417 @WWWzq-01 guys if I would like dump Effective Bandwidth and IOPS of GIDS Training vs Baseline, how to do that?
hello expert, I am suffering this cmake error, could you please take a look at. I can build bam and run benchmark well under my ubuntu20.04.3 and installed NV driver 560, CUDA 12.6