utcs-scea / altis

A benchmarking suite for heterogeneous systems. The primary goal of this project is to improve and update aspects of existing benchmarking suites which are either insufficient or outdated.
https://utcs-scea.github.io/altis/
BSD 2-Clause "Simplified" License
37 stars 14 forks source link

Build Failed on Ubuntu20.04 #20

Open alice890308 opened 1 year ago

alice890308 commented 1 year ago

Hi, I'm trying to build Altis on my server and on docker container, but both encounter the same errors. The following descriptions only show a part of the error messages. here contains the complete error messages

Environment

Ubuntu: 20.04 CUDA version: 11.8 Docker image: nvidia/cuda:11.8.0-devel-ubuntu20.04 cmake: 3.16 GPU: nvidia A100, sm number: 80

Error Messages

First I tried to run ./setup.sh and saw the following result

image image

Then I tried to understand the building process, so I checked here and applied these steps manually. When running cmake -DCMAKE_CUDA_ARCHITECTURES=80 it shows the following message. But I'm not sure if this is important.

image

The fatal error occurred when running the last make command.

image

Reproduce Steps

Run nvidia docker image

sudo nvidia-docker run -it nvidia/cuda:11.8.0-devel-ubuntu20.04 /bin/bash

Install git and cmake

apt-get update
apt-get install git cmake

Clone this repo

git clone https://github.com/utcs-scea/altis.git

run setup.sh or follow the build steps to build Altis.

Thanks for viewing my issue. Any reply is appreciated

rossbach commented 1 year ago

Mei,

We will look into this and get back to you shortly.

Chris

From: Mei @.> Sent: Saturday, December 17, 2022 10:59 AM To: utcs-scea/altis @.> Cc: Subscribed @.***> Subject: [utcs-scea/altis] Build Failed on Ubuntu20.04 (Issue #20)

Hi, I'm trying to build Altis on my server and on docker container, but both encounter the same errors. The following descriptions only show a part of the error messages. here https://gist.github.com/alice890308/e4e6172f7d5c5f1e1b88d97ed1ed35e4 contains the complete error messages

Environment

Ubuntu: 20.04 CUDA version: 11.8 Docker image: nvidia/cuda:11.8.0-devel-ubuntu20.04 cmake: 3.16 GPU: nvidia A100, sm number: 80

Error Messages

First I tried to run ./setup.sh and saw the following result

https://user-images.githubusercontent.com/52403980/208251180-f72b2674-8d31-4575-a0b3-4bc07e32f281.png

https://user-images.githubusercontent.com/52403980/208251201-ddd58489-4817-45d6-8ca0-9da4d3fec04b.png

Then I tried to understand the building process, so I checked here https://github.com/utcs-scea/altis/wiki/Build and applied these steps manually. When running cmake -DCMAKE_CUDA_ARCHITECTURES=80 it shows the following message. But I'm not sure if this is important.

https://user-images.githubusercontent.com/52403980/208251755-5a827a44-ef19-48c4-a6d8-dde7d1396f6d.png

The fatal error occurred when running the last make command.

https://user-images.githubusercontent.com/52403980/208251811-47a06f53-6e12-4ce7-9110-628945cdddcf.png

Reproduce Steps

Run nvidia docker image

sudo nvidia-docker run -it nvidia/cuda:11.8.0-devel-ubuntu20.04 /bin/bash

Install git and cmake

apt-get update apt-get install git cmake

Clone this repo

git clone https://github.com/utcs-scea/altis.git

run setup.sh or follow the build steps to build Altis.

Thanks for viewing my issue. Any reply is appreciated

— Reply to this email directly, view it on GitHub https://github.com/utcs-scea/altis/issues/20 , or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJ6DSOQ6FWMP5H37PT45PDWNXWMBANCNFSM6AAAAAATCBDLDE . You are receiving this because you are subscribed to this thread. https://github.com/notifications/beacon/AAJ6DSKHHK6EJBTP45GM22LWNXWMBA5CNFSM6AAAAAATCBDLDGWGG33NNVSW45C7OR4XAZNFJFZXG5LFVJRW63LNMVXHIX3JMTHFS7VLGI.gif Message ID: @. @.> >

BDHU commented 1 year ago

@alice890308 if you set VERBOSE=1 before cmake command what does it show? This way we can see the exact build command and what files make is expecting. Is it possible to get the complete make log? My speculation is some files are not built due to unspecified SM numbers.

alice890308 commented 1 year ago

@BDHU Hi! It shows the following messages.

root@23cf7bea18ba:/altis/config/cuda_device_attr_gen# make VERBOSE=1
/usr/local/cuda/bin/nvcc -ccbin g++ -I../../Common  -m64    -gencode arch=compute_80,code=sm_80 -gencode arch=compute_80,code=compute_80 -o deviceQuery.o -c deviceQuery.cpp
/usr/local/cuda/bin/nvcc -ccbin g++   -m64      -gencode arch=compute_80,code=sm_80 -gencode arch=compute_80,code=compute_80 -o deviceQuery deviceQuery.o
mkdir -p ../../bin/x86_64/linux/release
cp deviceQuery ../../bin/x86_64/linux/release

Is this the complete make log you are looking for? Or would you like me to check anything else?

BDHU commented 1 year ago

@alice890308 Apologies for the late reply. Can you go into the build directory and remove everything inside? Then execute these two commands:

cmake -DCMAKE_CUDA_ARCHITECTURES=$($SCRIPTPATH/config/get_cuda_sm.sh) ..

and

make VERBOSE=1

I've tested your setup with the exact same docker version and encountered no problem. However, I've only tested on SM61. Therefore, I suspect something has changed in the SM80 series. The above command allows us to see which specific make command causes the failure.

For example, I noticed you failed to build the maxflops object file. This is the first workload to build right after libAltisCommon.a is generated. In my setup, the building command is (that's why we need make VERBOSE=1 to show the message):

[  5%] Building CUDA object src/cuda/level0/maxflops/CMakeFiles/maxflopsLib.dir/MaxFlops.cu.o
cd /workspace/Desktop/altis/build/src/cuda/level0/maxflops && /usr/local/cuda/bin/nvcc   -I/workspace/Desktop/altis/src/cuda/common -I/workspace/Desktop/altis/src/cuda/../common  -w -gencode arch=compute_61,code=sm_61 -x cu -c /workspace/Desktop/altis/src/cuda/level0/maxflops/MaxFlops.cu -o CMakeFiles/maxflopsLib.dir/MaxFlops.cu.o

This specific line:

cd /workspace/Desktop/altis/build/src/cuda/level0/maxflops && /usr/local/cuda/bin/nvcc -I/workspace/Desktop/altis/src/cuda/common -I/workspace/Desktop/altis/src/cuda/../common -w -gencode arch=compute_61,code=sm_61 -x cu -c /workspace/Desktop/altis/src/cuda/level0/maxflops/MaxFlops.cu -o CMakeFiles/maxflopsLib.dir/MaxFlops.cu.o

is in charge of generating the MaxFlops.cu.o file. You can simple copy and rerun it to produce the same error without going through all the cmake generation process.

So in your setup, it might look like this:

[  5%] Building CUDA object src/cuda/level0/maxflops/CMakeFiles/maxflopsLib.dir/MaxFlops.cu.o
cd /workspace/Desktop/altis/build/src/cuda/level0/maxflops && /usr/local/cuda/bin/nvcc   -I/workspace/Desktop/altis/src/cuda/common -I/workspace/Desktop/altis/src/cuda/../common  -w -gencode arch=compute_80,code=sm_80 -x cu -c /workspace/Desktop/altis/src/cuda/level0/maxflops/MaxFlops.cu -o CMakeFiles/maxflopsLib.dir/MaxFlops.cu.o

I would first watch for any missing flags or parameters. It's very likely CMake failed to generate some commands.