Closed yushilingco closed 2 years ago
Hi @yushilingco, we are looking at a better build system to catch errors like this (some of the packet capture code needs mellanox libvma, and it isn't able to find this on your system).
The quickest way to get it to compile (for now) would be to delete or comment out these lines in https://github.com/ledatelescope/bifrost/blob/master/src/Makefile#L16
udp_socket.o \
udp_capture.o \
udp_transmit.o \
Thank you for your reply. But what should I do if I want to use the capture function of UDP. Do I need to prepare any necessary environment so that I can try again. Thank you very much!
For the missing mellanox/vma_extra.h you could try modifying the user.mk
file and commenting out the line that saysVMA = 1 # Enable use of Mellanox...
so that Bifrost doesn't try to use it. Otherwise you should be able to install vma_extra.h from libvma-dev package that is part of the Mellanox OFED bundle.
Thank you for your reply, I get it. That means I can only use NVIDIA's network card, right? And if I use an Intel network card, is it feasible? If you have any suggestions, please let me know. Thank you!
I have successfully installed libvma, so the previous problem has been solved. But for the latter question, do you have any suggestions:
ptxas /tmp/tmpxft_000031be_00000000-7_fft.compute_35.ptx, line 2379; error : Illegal operand type to instruction 'ld' ptxas /tmp/tmpxft_000031be_00000000-7_fft.compute_35.ptx, line 2379; error : Unknown symbol '__unnamed_1_param_0' ptxas /tmp/tmpxft_000031be_00000000-7_fft.compute_35.ptx, line 2747; error : Illegal operand type to instruction 'ld' ptxas /tmp/tmpxft_000031be_00000000-7_fft.compute_35.ptx, line 2747; error : Unknown symbol '__unnamed_2_param_0' ptxas fatal : Ptx assembly aborted due to errors autodep.mk:56: recipe for target 'fft.o' failed make[1]: [fft.o] Error 255 make[1]: Leaving directory '/home/yushiling/software/bifrost/bifrost-master/src' Makefile:15: recipe for target 'libbifrost' failed make: [libbifrost] Error 2
To your first question, no, you don't have to have a Mellanox card to use Bifrost (I've used it with Intel cards before). However, you will get better packet capture performance with a Mellanox card and libvma enabled.
For your second question, it looks like a problem building the CUDA portions of the library . What GPU and version of CUDA are you using?
Thank you very much for your reply. That's great. I show you the results of running "NVIDIA SMI" as follows:
NVIDIA-SMI 470.57.02 Driver Version: 470.57.02 CUDA Version: 11.4
I also found that the CUDA version may have caused the problem. So I tried to install CUDA 8.0, 9.0, 10.0 and 11.1, but there were errors in different situations. However, the previous error reports still exist. And added a hint:
nvlink fatal : Input file 'fft_kernels.o' newer than toolkit (111 vs 91) (target: sm_35)
Ok, and what model of GPU are you using?
Nvidia RTX 3080
In that case try changing the GPU_ARCHS line in user.mk
file to GPU_ARCHS ?= 86
to target your RTX 3080.
Yes, great. I feel there are fewer and fewer mistakes. But there are still several errors in .o files.
nvcc fatal : Unsupported gpu architecture 'compute_' Makefile:162: recipe for target 'fft_kernels.o' failed
Is it the problem with my CUDA version?
No, CUDA 11.4 should be fine for that architecture/compute capability. You can check with a:
nvcc -h | grep -Po "compute_[0-9]{2}" | sort | uniq
to see if "compute_86" is listed. However, that error would indicate otherwise. What you might try is removing the .SILENT:
line from src/Makefile
and re-running make
to see what the full compiler call is for fft_kernels.cu
.
Yes, "compute86" is listed. $sudo make make -C src all make[1]: Entering directory '/home/yushiling/software/bifrost/bifrost-master/src' nvcc -O3 -Xcompiler "-Wall" -std=c++11 -Xcompiler "-fPIC" -gencode arch=compute,\"code=compute_\" -DBF_GPU_SHAREDMEM=49152 -g -G -Xcompiler "-march=native" -DBF_DEBUG=1 -DBF_TRACE_ENABLED=1 -DBF_NUMA_ENABLED=1 -DBF_HWLOC_ENABLED=1 -DBF_VMA_ENABLED=1 -DBF_ALIGNMENT=4096 -DBF_CUDAENABLED=1 -I. -I. -I/usr/local/cuda/include -Xcompiler "-fmessage-length=80 " -c -o transpose.o transpose.cu nvcc fatal : Unsupported gpu architecture 'compute' autodep.mk:56: recipe for target 'transpose.o' failed make[1]: *** [transpose.o] Error 1 make[1]: Leaving directory '/home/yushiling/software/bifrost/bifrost-master/src'
Ok, yeah, it looks like there is a problem in src/Makefile
when it tries to set the values of GPU_ARCHS_SUPPORTED, GPU_ARCHS_VALID, and GPU_ARCH_LATEST. I would try manually running the shell commands that are used to define those variables. It's almost like GPU_ARCHS_VALID and GPU_ARCH_LATEST are empty.
Thank you very much for your patience. I installed Bifrost once before, but it failed, so this is my second installation. I don't know whether it will affect it?
That should be fine.
Well. And because my system is Ubuntu 18.04, the kernel version is 5.4.0-89-generic, and the gcc version is 5.5.0. Is this caused by incompatibility between these versions?
It shouldn't be. I run on Ubuntu 18.04 albeit with a slightly older 4.15 kernel and a newer version of gcc (7.5).
What did you find from looking at how the values of GPU_ARCHS_VALID and GPU_ARCH_LATEST are set?
You could also try building from the autoconf
branch of Bifrost to see if the new build system works for you. Once on that branch you would run configure
before you run make
.
I'm sorry I can't find GPU_ARCHS_VALID and GPU_ARCH_LATEST. And can you provide a configure for testing ?Thanks a lot.
How GPU_ARCHS_VALID and GPU_ARCH_LATEST are defined is in src/Makefile
. All you should need to do is convert the make shell calls into normal shell calls. Something like the following bash script:
#!/bin/bash
GPU_ARCHS=86
GPU_ARCHS_SUPPORTED=`nvcc -h | grep -Po "compute_[0-9]{2}" | cut -d_ -f2 | sort | uniq`
GPU_ARCHS_VALID=`echo "${GPU_ARCHS} ${GPU_ARCHS_SUPPORTED}" | xargs -n1 | sort | uniq -d | xargs`
GPU_ARCH_LATEST=`echo "${GPU_ARCHS_VALID}" | rev | cut -d' ' -f1 | rev`
echo "Supported: ${GPU_ARCHS_SUPPORTED}"
echo "Valid: ${GPU_ARCHS_VALID}"
echo "Latest: ${GPU_ARCHS_LATEST}"
For the configure
test you'll need to checkout the repository with git and then switch from the master branch to autoconf to access it.
The results are as follows: Supported: 30 32 35 37 50 52 53 60 61 62 70 72 75 80 86 Valid: 86 Latest:
The only other thing I can suggest is that you modify src/Makefile
and explicitly set GPU_ARCHS_VALID and GPU_ARCH_LATEST to 86 instead of the shell calls that are in there now.
Thank you for your advice. I want to check whether the contents in the user.mk file are correct. Can you tell me what the vacancy item represents? Thank you! CXX ?= g++ NVCC ?= nvcc LINKER ?= g++ CPPFLAGS ?= CXXFLAGS ?= -O3 -Wall -pedantic NVCCFLAGS ?= -O3 -Xcompiler "-Wall" #-Xptxas -v LDFLAGS ?= DOXYGEN ?= doxygen PYBUILDFLAGS ?=
Nothing, i.e., no additional flags are specified.
It means it doesn't need to be modified and there's no problem, right? I'm sorry I didn't make it clear. What I want to ask is what I need to fill in to match my machine?
No, these shouldn't need to be modified.
I am eager to use this framework to realize the real-time processing function, but because I have failed to compile many times, the installation is unsuccessful, and I can't use it further, so I take the liberty to ask you whether you can provide the successfully installed ISO image file?Thank you very much!
You can always try our Docker containers or AWS images. The instructions for them can be found here: https://github.com/ledatelescope/bifrost_tutorial
Did you try building from the autoconf branch?
Thank you. Yes, I don't know exactly what to do. If it's convenient for you, can you tell me how to do it~
Sure, try a:
git clone https://github.com/ledatelescope/bifrost.git bifrost_autoconf
cd bifrost_autoconf
git checkout autoconf
./configure
That's great, but I can't import successfully:
import bifrost Traceback (most recent call last): File "
", line 1, in File "/home/yushiling/anaconda3/lib/python3.8/site-packages/bifrost/init.py", line 37, in from bifrost import core, memory, affinity, ring, block, address, udp_socket File "/home/yushiling/anaconda3/lib/python3.8/site-packages/bifrost/core.py", line 32, in from bifrost.libbifrost import _bf File "/home/yushiling/anaconda3/lib/python3.8/site-packages/bifrost/libbifrost.py", line 41, in import bifrost.libbifrost_generated as _bf File "/home/yushiling/anaconda3/lib/python3.8/site-packages/bifrost/libbifrost_generated.py", line 816, in _libs["bifrost"] = load_library("bifrost") File "/home/yushiling/anaconda3/lib/python3.8/site-packages/bifrost/libbifrost_generated.py", line 541, in call raise ImportError("Could not load %s." % libname) ImportError: Could not load bifrost.
I see. Changing the python version to 2.7 can succeed. Does it not support 3.8?
When I run the a simple pipeline example, an error is reported: File "/home/yushiling/anaconda3/envs/learn/lib/python2.7/site-packages/bifrost/blocks/transpose.py", line 73, in on_data bf.transpose.transpose(ospan.data, ispan.data, self.axes) AttributeError: 'module' object has no attribute 'transpose'
It's good to hear that you were able to build and install using the autoconf branch. I think the most recent bf.transpose.transpose
error is a bug in the Python high level blocks.
Is there any solution?
Try applying the changes in this commit: https://github.com/ledatelescope/bifrost/pull/153/commits/8c035685985a5a419cda851831df8fba0364130d
Thank you very much. I've looked for it, but I seem to have missed this information. I'll try it.
Can I ask you how to receive UPD packets in real time and copy them to GPU for operation?
At this point you should take a look at the Bifrost tutorials: https://github.com/ledatelescope/bifrost_tutorial
Any additional questions or problems you have that are not related to installation should be submitted as new issues.
OK, I see. Thank you again for your patience. I have no problem with the installation now!
Closing with the release of v0.10.0.
$ make -j udp_capture.cpp:87:32: fatal error: mellanox/vma_extra.h: No such file or directory compilation terminated. autodep.mk:32: recipe for target 'udp_capture.o' failed make[1]: [udp_capture.o] Error 1 ... ptxas /tmp/tmpxft_000029f6_00000000-7_fft.compute_35.ptx, line 2379; error : Illegal operand type to instruction 'ld' ptxas /tmp/tmpxft_000029f6_00000000-7_fft.compute_35.ptx, line 2379; error : Unknown symbol '__unnamed_1_param_0' ptxas /tmp/tmpxft_000029f6_00000000-7_fft.compute_35.ptx, line 2747; error : Illegal operand type to instruction 'ld' ptxas /tmp/tmpxft_000029f6_00000000-7_fft.compute_35.ptx, line 2747; error : Unknown symbol '__unnamed_2_param_0' ptxas fatal : Ptx assembly aborted due to errors autodep.mk:56: recipe for target 'fft.o' failed make[1]: [fft.o] Error 255 make[1]: Leaving directory '/home/yushiling/software/bifrost/bifrost-master/src' Makefile:15: recipe for target 'libbifrost' failed make: *** [libbifrost] Error 2
I encountered the above problems during installation. I tried to install the missing package, but it seems incomplete. I hereby ask for the reason. Thank you very much.