Closed monadnoc closed 6 years ago
Toolkit 10 has not been tested yet. You will have to try out building RPMs separately and depending on how the packaging of toolkit 10 vs. 8 have changed make changes to Makefile and version.mk in corresponding src/nvidia-toolkit and src/nvidia-driver. Basically, cd src/nvidia-toolkit make rpm and see what breaks Every time there is a major nvidia toolkit release there need to be changes to the build process
I changed to toolkit 8.0 (https://developer.nvidia.com/cuda-80-ga2-download-archive).
The same issue described occurs--first argument to 'word' function must be greater than '0'
--though it seems this might just be because the 'dump-version'
is echoing 7.0
. The full output of this error line is actually
/opt/rocks/share/devel/src/roll/../../etc/Rules.mk:622: *** first argument to `word' function must be greater than 0. Stop.
but it is unclear what part of Rules.mk at line 622 requires the the input for 'word' to be greater than 0
build.log from this new attempt with Toolkit 8.0 is attached
I also tried make rpm
as suggested, but that halts on trying to tar a file/folder that's not there:
tar: /opt/cuda/SOURCES/roll-cuda-7.0.tar: Cannot open: No such file or directory tar: Error is not recoverable: exiting now
Everything leading up to this error in the output by make --debug rpm
is attached in build_rpm.log
build.log
build_rpm.log
Both the nvidia toolkit and driver exited with a successfully remade target file 'rpm'
(see attached), but the make 2>&1
command to build the .iso still exits with the same /opt/rocks/share/devel/src/roll/../../etc/Rules.mk:622: *** first argument to 'word' function must be greater than 0. Stop.
when trying to remake the dump-name
target.
Any ideas?
what is the output of rocks list roll
and the output of make -n 2>&1 > out
and the output of make preroll
rocks list roll:
NAME VERSION ARCH ENABLED
base: 7.0 x86_64 yes
CentOS: 7.4.1708 x86_64 yes
core: 7.0 x86_64 yes
kernel: 7.0 x86_64 yes
Updates-CentOS-7.4.1708: 2017-12-01 x86_64 yes
sge: 7.0 x86_64 yes
hpc: 7.0 x86_64 yes
ganglia: 7.0 x86_64 yes
make -n 2>&1 > out
/opt/rocks/share/devel/src/roll/../../etc/Rules.mk:622: *** first argument to `word' function must be greater than 0. Stop.
and 'out' reads echo 7.0
lastly, make preroll:
for i in `ls nodes/*.xml.in`; do \
export o=`echo $i | sed 's/\.in//'`; \
cp $i $o; \
sed -i -e "s/TOOLKIT_SHORT/80/g" $o; \
done
Thank you very much for you time working through this
You have a bit older Updates roll but this should not really matter for make.
What is
rpm -qf /opt/rocks/share/devel/src/roll/../../etc/Rules.mk
and
rpm -V rocks-devel
and "pwd" at the top level of your cloned repo
rpm -qf /opt/rocks/share/devel/src/roll/../../etc/Rules.mk
rocks-devel-7.0-9.x86_64
rpm -V rocks devel has no output
pwd in cloned repo is
/root/cuda
everything so far looks ok, so i am not sure what is causing he problem. Have you changed any files after clowning the repo or have you made any updates to the system outside of rocks commands, your root user environment?
The only change made to the cloned repo was to substitute the driver for a Tesla K40c (and changed the version.mk file to correspond), but even with the driver provided with the bootstrap.sh, the same error occurs.
The rocks install is fresh and follows the installation guidelines--nothing has been modified so far.
Since this error seems related to my system rather than the repo, I have modified the title of the issue, and I will close it for now until I become more familiar with the Rocks configuration.
Thank you very much for the help (some of it nearly in real-time!)
Hi - I know this is closed, but indeed 'make rpm' in src/nvidia-toolkit fails for cuda-linux_10.0.130-linux.run which is cuda-10 (I know cuda 8 is expected). I am not an expert, but the DISTRO variable in src/cuda-toolkit/version.mk may need to be changed to the cuda 10 version. Also there's the usual headache with how nvidia names their *run packages: the exact error is :
make ROOT=/share/apps/cuda_rocks_roll2/cuda/src/nvidia-toolkit/cuda-toolkit10.buildroot install
make[2]: Entering directory /share/apps/cuda_rocks_roll2/cuda/BUILD/cuda-toolkit10-10.0.130' ///from nvidia distro install toolkit and samples in /opt mkdir -p distro /bin/bash cuda_10.0.130_linux-run -extract=
pwd`/distro
/bin/bash: cuda_10.0.130_linux-run: No such file or directory
whereas the file created is cuda-linux_10.0.130-linux.run
so if you put a 'cuda_10.0.130_linux-run' link to whatever cuda calls their *rn file in your src/nvidia-driver (and change the DISTRO variable in version.mk) then 'make rpm' completes successfully.
As always, thank you for this roll Nadya Williams!
You are right about one variable DISTRO in the src/nvidia-toolkit/version.mk file. Looking back it may have been an accidental commit while playing with one of the driver versions. This should be a variable that is pulled from the toolkit and version numbers that are recorded in top level cuda.mk file (fixed now). As far as nvidia toolkit and driver files naming conventions go, it is never consistent and one MUST edit version.mk file and make proper adjustments. Currently, there are lines there for versions 7 and 8. I have not played with 10 so far. But the approach should be for the most part the same.
I'm glad that was helpful. Maybe given nvidia's penchant for creative naming just put the necessary logic in bootstrap.sh or tell the user... Anyhow, all is well, thank you very much Nadya!
@nadyawilliams
For a fresh install of Rocks 7.0
make 2>&1 | tee build.log
yieldsfirst argument to 'word' function must be greater than '0'
make --debug 2>&1 | tee build.log
identified that make is trying toremake target 'dump-version'
and is invokingRules.mk:611
to do so.This is a pretty early break in the make process--is Rocks 7 (CentOS 7.4) not supported? Is Nvidia Toolkit 10.0 the problem?