Open rkabrick opened 3 years ago
Ryan, I had the same issue during development, but there had been many, and I forgot the cause. Maybe, it has to do with the cache. Try to remove the cache file in the optx folder. The name is OptixCache_$USER. One thing I remember is to clear the cache after changing the file structure (rename, split, combine) or content of the .cu files. If you run the program from make, use make tidy all. This will clear the cache as well. A minute ago, I tried the repo myself, and it worked (see log below).
What is your system configuration? I use an AWS EC2 instance of type g4dn.xlarge with an Amazon Linux AMI which is similar to Red Hat Linux. The instance is the cheapest with an NVIDIA T4 GPU with Turing microarchitecture, thus RT Cores.
I really appreciate you taking note of my work and if you have any questions don't hesitate to ask.
Best regards, Jürgen
nvcc
-c rtwo.cxx -o rtwo.o \
-rdc true -std c++11 -ccbin g++ -Xcompiler
-Wall,-Wsign-compare,-Wno-multichar,-funroll-loops,-fPIC,-msse,-msse2,-msse3,-mfpmath=sse,-O3,-g3,-DNDEBUG
-arch sm_75 -use_fast_math \
-I/usr/local/cuda/include -I/usr/local/optix/include
-I/usr/local/optix/SDK -I/usr/local/optix/SDK/support \
nvcc -ptx camera.cu -o camera.ptx \
-rdc true -std c++11 -ccbin g++ -Xcompiler
-Wall,-Wsign-compare,-Wno-multichar,-funroll-loops,-fPIC,-msse,-msse2,-msse3,-mfpmath=sse,-O3,-g3,-DNDEBUG
-arch sm_75 -use_fast_math \
-I/usr/local/cuda/include -I/usr/local/optix/include
-I/usr/local/optix/SDK -I/usr/local/optix/SDK/support \
bin2c -c -p 0 -n camera_ptx camera.ptx > camera.c
gcc -c camera.c -o camera.o
nvcc -ptx optics.cu -o optics.ptx \
-rdc true -std c++11 -ccbin g++ -Xcompiler
-Wall,-Wsign-compare,-Wno-multichar,-funroll-loops,-fPIC,-msse,-msse2,-msse3,-mfpmath=sse,-O3,-g3,-DNDEBUG
-arch sm_75 -use_fast_math \
-I/usr/local/cuda/include -I/usr/local/optix/include
-I/usr/local/optix/SDK -I/usr/local/optix/SDK/support \
bin2c -c -p 0 -n optics_ptx optics.ptx > optics.c
gcc -c optics.c -o optics.o
nvcc -c sphere.cxx -o sphere.o \
-rdc true -std c++11 -ccbin g++ -Xcompiler
-Wall,-Wsign-compare,-Wno-multichar,-funroll-loops,-fPIC,-msse,-msse2,-msse3,-mfpmath=sse,-O3,-g3,-DNDEBUG
-arch sm_75 -use_fast_math \
-I/usr/local/cuda/include -I/usr/local/optix/include
-I/usr/local/optix/SDK -I/usr/local/optix/SDK/support \
g++ -o rtwo rtwo.o camera.o optics.o sphere.o \
-L ~/optix-samples/lib -lsutil_7_sdk -lglad \
-L /usr/local/cuda/lib64 -lcudart -lpthread -lrt -ldl \
-lm \
OPTIX_CACHE_PATH=${OPTIX_CACHE_PATH:-./OptixCache_$USER} ./rtwo | magick
ppm:- rtwo.png
OptiX API message : 4 : KNOBS : All knobs on default.
OptiX API message : 4 : DISK CACHE : Opened database:
"/home/ec2-user/lab/RTXplay/optx/./OptixCache_ec2-user/cache7.db"
OptiX API message : 4 : DISK CACHE : Cache data size: "0 Bytes"
OptiX API message : 4 : DISKCACHE : Cache miss for key:
ptx-344769-keye389950cd53b9bf77e4b501b11fda77e-sm_75-rtc1-drv455.32.00
OptiX API message : 4 : DISKCACHE : Inserted module in cache with key:
ptx-344769-keye389950cd53b9bf77e4b501b11fda77e-sm_75-rtc1-drv455.32.00
OptiX API message : 4 : COMPILE FEEDBACK : Info: Pipeline parameter
"lp_general" size is 32 bytes
Info: Module uses 6 payload values. Pipeline configuration: 6.
Info: Module uses 0 attribute values. Pipeline configuration: 2 (default).
Info: Entry function "__raygen__camera" with semantic type RAYGEN has 1
trace call(s), 0 continuation callable call(s), 0 direct callable call(s),
218 basic block(s), 3783 instruction(s)
Info: Entry function "__miss__ambient" with semantic type MISS has 0 trace
call(s), 0 continuation callable call(s), 0 direct callable call(s), 2
basic block(s), 31 instruction(s)
Info: 7 non-entry function(s) have 42 basic block(s), 665 instruction(s)
OptiX API message : 4 : DISKCACHE : Cache miss for key:
ptx-304456-keya7a416ca7c81e496204ff15b1c2d051d-sm_75-rtc1-drv455.32.00
OptiX API message : 4 : DISKCACHE : Inserted module in cache with key:
ptx-304456-keya7a416ca7c81e496204ff15b1c2d051d-sm_75-rtc1-drv455.32.00
OptiX API message : 4 : COMPILE FEEDBACK : Info: Pipeline parameter
"lp_general" size is 32 bytes
Info: Module uses 6 payload values. Pipeline configuration: 6.
Info: Module uses 0 attribute values. Pipeline configuration: 2 (default).
Info: Entry function "__closesthit__diffuse" with semantic type CLOSESTHIT
has 1 trace call(s), 0 continuation callable call(s), 0 direct callable
call(s), 8 basic block(s), 234 instruction(s)
Info: Entry function "__closesthit__reflect" with semantic type CLOSESTHIT
has 1 trace call(s), 0 continuation callable call(s), 0 direct callable
call(s), 9 basic block(s), 259 instruction(s)
Info: Entry function "__closesthit__refract" with semantic type CLOSESTHIT
has 1 trace call(s), 0 continuation callable call(s), 0 direct callable
call(s), 13 basic block(s), 254 instruction(s)
Info: 7 non-entry function(s) have 42 basic block(s), 665 instruction(s)
OptiX API message : 4 : DISKCACHE : Cache miss for key:
ptx-2056-keyd0464b889758230fd557fdcb6fab4be0-sm_75-rtc1-drv455.32.00
OptiX API message : 4 : DISKCACHE : Inserted module in cache with key:
ptx-2056-keyd0464b889758230fd557fdcb6fab4be0-sm_75-rtc1-drv455.32.00
OptiX API message : 4 : COMPILE FEEDBACK : Info: Pipeline has 2 module(s),
5 entry function(s), 4 trace call(s), 0 continuation callable call(s), 0
direct callable call(s), 250 basic block(s) in entry functions, 4561
instruction(s) in entry functions, 14 non-entry function(s), 84 basic
block(s) in non-entry functions, 1330 instruction(s) in non-entry functions
OptiX pipeline for RTWO ran 7084 milliseconds
OptiX API message : 4 : DISK CACHE : Closed database:
"/home/ec2-user/lab/RTXplay/optx/./OptixCache_ec2-user/cache7.db"
OptiX API message : 4 : DISK CACHE : Cache data size: "1.1 MiB"
rm camera.ptx optics.ptx camera.c optics.c
[ec2-user@ip-172-31-23-10 optx]$ ls -lrt
total 6552
-rwxrwxr-x 1 ec2-user ec2-user 2969 Feb 21 15:42 Makefile
-rw-rw-r-- 1 ec2-user ec2-user 4581 Feb 21 15:42 v.h
-rw-rw-r-- 1 ec2-user ec2-user 450 Feb 21 15:42 util.h
-rw-rw-r-- 1 ec2-user ec2-user 777 Feb 21 15:42 util_gpu.h
-rw-rw-r-- 1 ec2-user ec2-user 619 Feb 21 15:42 util_cpu.h
-rw-rw-r-- 1 ec2-user ec2-user 165 Feb 21 15:42 things.h
-rw-rw-r-- 1 ec2-user ec2-user 890 Feb 21 15:42 thing.h
-rw-rw-r-- 1 ec2-user ec2-user 587 Feb 21 15:42 sphere.h
-rw-rw-r-- 1 ec2-user ec2-user 5033 Feb 21 15:42 sphere.cxx
-rw-rw-r-- 1 ec2-user ec2-user 727 Feb 21 15:42 rtwo.h
-rw-rw-r-- 1 ec2-user ec2-user 21732 Feb 21 15:42 rtwo.cxx
-rw-rw-r-- 1 ec2-user ec2-user 2429 Feb 21 15:42 reduce.cxx
-rw-rw-r-- 1 ec2-user ec2-user 430 Feb 21 15:42 optics.h
-rw-rw-r-- 1 ec2-user ec2-user 10140 Feb 21 15:42 optics.cu
-rw-rw-r-- 1 ec2-user ec2-user 1131 Feb 21 15:42 camera.h
-rw-rw-r-- 1 ec2-user ec2-user 3555 Feb 21 15:42 camera.cu
-rw-rw-r-- 1 ec2-user ec2-user 1881584 Feb 21 15:42 rtwo.o
-rw-rw-r-- 1 ec2-user ec2-user 345840 Feb 21 15:42 camera.o
-rw-rw-r-- 1 ec2-user ec2-user 305528 Feb 21 15:42 optics.o
-rw-rw-r-- 1 ec2-user ec2-user 1164360 Feb 21 15:43 sphere.o
-rwxrwxr-x 1 ec2-user ec2-user 2058976 Feb 21 15:43 rtwo
drwxrwxr-- 2 ec2-user ec2-user 23 Feb 21 15:43 OptixCache_ec2-user
-rw-rw-r-- 1 ec2-user ec2-user 839641 Feb 21 15:43 rtwo.png
My config right now is my own desktop. I am running Pop!_OS 20.10 w/ an Intel i9-9900k and an RTX 2080Ti. Unfortunately, the make tidy all
did not work; it resulted in the same error. I will try the process from the start once again and let you know what happens.
As an aside, I see in my error output it starts with an OptiX exception and then goes on to be an issue with Magick... Would not using Magick be an option? Like in Peter Shirley's original books
Magick reads stdout from rtwo via pipe. The error occurs in rtwo and should thus stil be there if you replace magick with a redirection of stdout to /dev/null.
Ryan Kabrick notifications@github.com schrieb am So. 21. Feb. 2021 um 21:11:
My config right now is my own desktop. I am running Pop!_OS 20.10 w/ an Intel i9-9900k and an RTX 2080Ti. Unfortunately, the make tidy all did not work; it resulted in the same error. I will try the process from the start once again and let you know what happens.
As an aside, I see in my error output it starts with an OptiX exception and then goes on to be an issue with Magick... Would not using Magick be an option? Like in Peter Shirley's original books
— You are receiving this because you commented.
Reply to this email directly, view it on GitHub https://github.com/otabuzzman/RTXplay/issues/1#issuecomment-782918271, or unsubscribe https://github.com/notifications/unsubscribe-auth/AD7PMXAJSD43W5XP5CFMQ2DTAFSIXANCNFSM4X62LN3Q .
Ahh okay that makes sense. Alright just completely uninstalled cuda, nvidia-drivers, OptiX, ImageMagick, etc. and unfortunately, it gives the same magick improper image header
nvcc -c rtwo.cxx -o rtwo.o \
-rdc true -std c++11 -ccbin g++ -Xcompiler -Wall,-Wsign-compare,-Wno-multichar,-funroll-loops,-fPIC,-msse,-msse2,-msse3,-mfpmath=sse,-O3,-g3,-DNDEBUG -arch sm_75 -use_fast_math \
-I/usr/local/cuda/include -I/usr/local/optix-7.2/include -I/usr/local/optix-7.2/SDK -I/usr/local/optix-7.2/SDK/support \
nvcc -ptx camera.cu -o camera.ptx \
-rdc true -std c++11 -ccbin g++ -Xcompiler -Wall,-Wsign-compare,-Wno-multichar,-funroll-loops,-fPIC,-msse,-msse2,-msse3,-mfpmath=sse,-O3,-g3,-DNDEBUG -arch sm_75 -use_fast_math \
-I/usr/local/cuda/include -I/usr/local/optix-7.2/include -I/usr/local/optix-7.2/SDK -I/usr/local/optix-7.2/SDK/support \
nvcc -ptx optics.cu -o optics.ptx \
-rdc true -std c++11 -ccbin g++ -Xcompiler -Wall,-Wsign-compare,-Wno-multichar,-funroll-loops,-fPIC,-msse,-msse2,-msse3,-mfpmath=sse,-O3,-g3,-DNDEBUG -arch sm_75 -use_fast_math \
-I/usr/local/cuda/include -I/usr/local/optix-7.2/include -I/usr/local/optix-7.2/SDK -I/usr/local/optix-7.2/SDK/support \
nvcc -c sphere.cxx -o sphere.o \
-rdc true -std c++11 -ccbin g++ -Xcompiler -Wall,-Wsign-compare,-Wno-multichar,-funroll-loops,-fPIC,-msse,-msse2,-msse3,-mfpmath=sse,-O3,-g3,-DNDEBUG -arch sm_75 -use_fast_math \
-I/usr/local/cuda/include -I/usr/local/optix-7.2/include -I/usr/local/optix-7.2/SDK -I/usr/local/optix-7.2/SDK/support \
bin2c -c -p 0 -n optics_ptx optics.ptx > optics.c
gcc -c optics.c -o optics.o
bin2c -c -p 0 -n camera_ptx camera.ptx > camera.c
gcc -c camera.c -o camera.o
g++ -o rtwo rtwo.o camera.o optics.o sphere.o \
-L ~/optix-samples/lib -lsutil_7_sdk -lglad \
-L /usr/local/cuda/lib64 -lcudart -lpthread -lrt -ldl \
-lm \
OPTIX_CACHE_PATH=${OPTIX_CACHE_PATH:-./OptixCache_$USER} ./rtwo | magick ppm:- rtwo.png
OptiX API message : 4 : KNOBS : All knobs on default.
OptiX API message : 4 : DISK CACHE : Opened database: "/home/brick/dev-brick/RTXplay/optx/./OptixCache_brick/cache7.db"
OptiX API message : 4 : DISK CACHE : Cache data size: "0 Bytes"
OptiX API message : 2 : ERROR : Invalid value (872603844) for "buildInputs[0].triangleArray.flags[0]"
exception: OPTIX_ERROR_INVALID_VALUE: Optix call 'optixAccelComputeMemoryUsage( optx_context, &oas_options, obi_things.data(), static_cast<unsigned int>( obi_things.size() ), &as_buffer_sizes )' failed: rtwo.cxx:168)
magick: improper image header `/tmp/magick-5eyXCq6fsK5-fCjwXYmfjO2gcfGjkx5e' @ error/pnm.c/ReadPNMImage/334.
make: *** [Makefile:141: rtwo.png] Error 1
rm optics.ptx optics.c camera.c camera.ptx
Weird. Can you build and run the OptiX Samples?
Ryan Kabrick notifications@github.com schrieb am Mo. 22. Feb. 2021 um 07:56:
Ahh okay that makes sense. Alright just completely uninstalled cuda, nvidia-drivers, OptiX, ImageMagick, etc. and unfortunately, it gives the same magick improper image header
nvcc -c rtwo.cxx -o rtwo.o \ -rdc true -std c++11 -ccbin g++ -Xcompiler -Wall,-Wsign-compare,-Wno-multichar,-funroll-loops,-fPIC,-msse,-msse2,-msse3,-mfpmath=sse,-O3,-g3,-DNDEBUG -arch sm_75 -use_fast_math \ -I/usr/local/cuda/include -I/usr/local/optix-7.2/include -I/usr/local/optix-7.2/SDK -I/usr/local/optix-7.2/SDK/support \
nvcc -ptx camera.cu -o camera.ptx \ -rdc true -std c++11 -ccbin g++ -Xcompiler -Wall,-Wsign-compare,-Wno-multichar,-funroll-loops,-fPIC,-msse,-msse2,-msse3,-mfpmath=sse,-O3,-g3,-DNDEBUG -arch sm_75 -use_fast_math \ -I/usr/local/cuda/include -I/usr/local/optix-7.2/include -I/usr/local/optix-7.2/SDK -I/usr/local/optix-7.2/SDK/support \
nvcc -ptx optics.cu -o optics.ptx \ -rdc true -std c++11 -ccbin g++ -Xcompiler -Wall,-Wsign-compare,-Wno-multichar,-funroll-loops,-fPIC,-msse,-msse2,-msse3,-mfpmath=sse,-O3,-g3,-DNDEBUG -arch sm_75 -use_fast_math \ -I/usr/local/cuda/include -I/usr/local/optix-7.2/include -I/usr/local/optix-7.2/SDK -I/usr/local/optix-7.2/SDK/support \
nvcc -c sphere.cxx -o sphere.o \ -rdc true -std c++11 -ccbin g++ -Xcompiler -Wall,-Wsign-compare,-Wno-multichar,-funroll-loops,-fPIC,-msse,-msse2,-msse3,-mfpmath=sse,-O3,-g3,-DNDEBUG -arch sm_75 -use_fast_math \ -I/usr/local/cuda/include -I/usr/local/optix-7.2/include -I/usr/local/optix-7.2/SDK -I/usr/local/optix-7.2/SDK/support \
bin2c -c -p 0 -n optics_ptx optics.ptx > optics.c gcc -c optics.c -o optics.o bin2c -c -p 0 -n camera_ptx camera.ptx > camera.c gcc -c camera.c -o camera.o g++ -o rtwo rtwo.o camera.o optics.o sphere.o \ -L ~/optix-samples/lib -lsutil_7_sdk -lglad \ -L /usr/local/cuda/lib64 -lcudart -lpthread -lrt -ldl \ -lm \
OPTIX_CACHE_PATH=${OPTIX_CACHEPATH:-./OptixCache$USER} ./rtwo | magick ppm:- rtwo.png OptiX API message : 4 : KNOBS : All knobs on default.
OptiX API message : 4 : DISK CACHE : Opened database: "/home/brick/dev-brick/RTXplay/optx/./OptixCache_brick/cache7.db" OptiX API message : 4 : DISK CACHE : Cache data size: "0 Bytes" OptiX API message : 2 : ERROR : Invalid value (872603844) for "buildInputs[0].triangleArray.flags[0]" exception: OPTIX_ERROR_INVALID_VALUE: Optix call 'optixAccelComputeMemoryUsage( optx_context, &oas_options, obi_things.data(), static_cast
( obi_things.size() ), &as_buffer_sizes )' failed: rtwo.cxx:168) magick: improper image header `/tmp/magick-5eyXCq6fsK5-fCjwXYmfjO2gcfGjkx5e' @ error/pnm.c/ReadPNMImage/334. make: *** [Makefile:141: rtwo.png] Error 1 rm optics.ptx optics.c camera.c camera.ptx
— You are receiving this because you commented.
Reply to this email directly, view it on GitHub https://github.com/otabuzzman/RTXplay/issues/1#issuecomment-783137701, or unsubscribe https://github.com/notifications/unsubscribe-auth/AD7PMXHOLGRSPF7XRMN2NKDTAH52XANCNFSM4X62LN3Q .
It is very strange because I can build and run all the examples
optixAccelComputeMemoryUsage is actually the first API call after init. The number in brackets might be a pointer at device memory which is allocated by Sphere which in turn is called multiple times in scene(). Just to make sure to exclude any memory limitations, you could reduce scene to setup just one sphere.
Am Mo., 22. Feb. 2021 um 13:08 Uhr schrieb Ryan Kabrick < notifications@github.com>:
It is very strange because I can build and run all the examples
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/otabuzzman/RTXplay/issues/1#issuecomment-783328659, or unsubscribe https://github.com/notifications/unsubscribe-auth/AD7PMXH7NUOC7FFSUZJVRKDTAJCLVANCNFSM4X62LN3Q .
Still no luck. Tried with 1 sphere... and no spheres. I have 64 GB of RAM the 2080Ti is an 11GB card so a mem limitation doesn't sound likely (to me at least). You are without a doubt the expert on this compared to me, so I could be very wrong in what I'm saying. What was the nvidia driver in your AWS instance? Also what kernel did the instance use?
EDIT: Also I really am grateful for your help. I apologize that this was not an easy fix.
Agree. Memory is no problem. My kernel is 4.14, NVIDIA driver is 455.32, CUDA is 11.1. Yes, this is a hard one but maybe we'll cope.
Ahhh we may have found the problem. My kernel is 5.8.
Also, for clarifications sake, you say in the README to install cuda-11.1... I know the cuda version displayed by nvcc -v
and nvidia-smi
are separate entities but I'm not sure if they are supposed to be identical. Let me know
My fault. I corrected my comment. CUDA is 11.1. What are your plans regarding the kernel version? A quick Google search gave no hints stating problems with newer Linux kernels. Maybe you should give strace a try? You could frame the call to optixAccelComputeMemoryUsage by two open("file", ...) calls, to quickly point at the relevant section in strace's output.
Forget strace. The error message is misleading. When I worked on an extension for multiple frames (to make a clip) I got the same error but the cause was a memory violation in a shader program. So, unfortunately the newer Linux kernel version is the only reference point you have. You might check out my recent commit as well: I fixed recursion depth configuration. Was set to 4 or 6 but RTOW goes deeper. Was no error on my AWS instance, but might be on your‘s.
To clarify, when you do nvidia-smi
the cuda version that comes up is 11.1 and not 11.2?
As nvcc --version
returns 11.1 but nvidia-smi
returns 11.2 and I cannot seem to get that to change. At this point that is the only thing I can imagine is the issue. I messed with strace for a little but to no avail. Glad you followed up and explained.
My CUDA is 11.1. See output of NVCC and nvidia-smi
below.
$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2020 NVIDIA Corporation
Built on Mon_Oct_12_20:09:46_PDT_2020
Cuda compilation tools, release 11.1, V11.1.105
Build cuda_11.1.TC455_06.29190527_0
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 455.32.00 Driver Version: 455.32.00 CUDA Version: 11.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla T4 Off | 00000000:00:1E.0 Off | 0 |
| N/A 43C P0 27W / 70W | 0MiB / 15109MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
Hello again, sorry got overtaken by work these past few days.
Anyways, my nvidia-smi
output was showing 11.2 and nvcc -v
was showing 11.1 so I was hopeful this was the problem. However, I still get the same output. Will keep trying
Unfortunately, no matter what I change, I cannot seem to get it to compile. Going to try to write the code myself and in the process, perhaps narrow down on what the issue is. Thank you again for all of your time and help.
Compile? I thought it was a runtime problem. Anyway, starting an OptiX project by yourself is for sure a great idea. You‘ll gain lots of experience. I myself started out from the optixTriangle sample in the SDK and developed it step by step to finally have RTOW. Have fun and let me know if I can help.
I finally came upon that very same error you reported months ago. It was caused by an automatic variable inside a for-loop that went out of scope when the loop finished. The loop set up the accelleration structure, thus the error in optixAccelComputeMemoryUsage
API call. If you should give it a try I suggest to checkout the manastra
branch (latest commit). Regards, Jürgen.
Just stumbled upon your project and loved the idea because I'm trying to learn Optix 7. So little documentation/examples. Anyways, I was following the instructions you put for Linux and everything seemed to be working however the make command (inside the optx directory) leads to this:
I apologize if this is a simple fix that has nothing to do with this repo but any guidance you can offer would be helpful