Closed webmaster128 closed 5 years ago
It's very unlikely that you're running as root on Ubuntu, unless you run with sudo
, which you don't. GPUVerify needs write access to the directories in which the tests occur, so it's best to just put GPUVerify somewhere in your home directory. It's not clear to me why you're even trying to put it in a shared location.
It's very unlikely that you're running as root on Ubuntu, unless you run with
sudo
, which you don't.
It is a server and I'm root by default.
It's not clear to me why you're even trying to put it in a shared location.
Basically because I'm root and every location is the same to the root user. This is a single-use virtual machine. To clean up I delete it and create a new one.
Anyways, the same error occurs when I login as user bob
and install to /home/bob/gpuverify/2018-03-22
.
Is it normal that [Unknown]
is part of the file path? E.g. /opt/gpuverify/2018-03-22/testsuite/CUDA/align/[Unknown]
.
I don't know, because I'm not able to reproduce the problem on a clean Ubuntu 18.04.1 install (both Desktop and Server). You need to provide more information on how you setup your system.
I'm not able to reproduce the problem on a clean Ubuntu 18.04.1 install (both Desktop and Server)
Oh, interesting, thanks for sharing. So this might even be file system-specific. I'll try a couple of different systems and let you know what I found.
This error seems to occur when the systems runs out of memory. A 2 GB VM is not enough to run the tests, I guess.
It should be enough. I can successfully run the tests you mention above on a VM configured with 1GB of internal memory.
In fact, I can run the whole test suite and it behaves as expected.
Here's what I did: configure a VM in VirtualBox with 1GB of memory and 20GB disk space. Install Ubuntu Server 18.04.1 choosing all the default options along the way. Install the mono-complete, python, python-pip, and unzip packages. Get the GPUVerify release, unzip, and run the test suite.
This issue is indeed not only triggered by out of memory. But it is also triggered by OOM. On my local Ubuntu 18.04 machine, I can run the test suite with no problem. But when I verify a larger kernel (one with 2000, one with 20000 lines) after some time the memory usage increases dramatically, fills up the 8 GB memory, fills up swap and then ends with the above error.
This is what I do to trigger the issue in the test suite:
apt update && apt upgrade -y && apt autoremove -y && apt install -y python python-psutil unzip mono-complete && reboot
wget https://github.com/mc-imperial/gpuverify/releases/download/2018-03-22/GPUVerifyLinux64.zip
unzip GPUVerifyLinux64.zip
cd 2018-03-22
./gvtester.py --write-pickle run.pickle testsuite/
Mono version
$ mono --version
Mono JIT compiler version 4.6.2 (Debian 4.6.2.7+dfsg-1ubuntu1)
Copyright (C) 2002-2014 Novell, Inc, Xamarin Inc and Contributors. www.mono-project.com
TLS: __thread
SIGSEGV: altstack
Notifications: epoll
Architecture: amd64
Disabled: none
Misc: softdebug
LLVM: supported, not enabled.
GC: sgen
Ahh, there is -l DEBUG
, which reveals a No regular expressions. issue:
INFO:Using 8 threads
INFO:Running tests...
INFO:[Thread-1] Running test /root/2018-03-22/testsuite/CUDA/align/kernel.cu
DEBUG:GPUVerifyTestKernel:
Full Path:/root/2018-03-22/testsuite/CUDA/align/kernel.cu
Expected exit code:SUCCESS
CmdArgs: ['--blockDim=2', '--gridDim=2', '--no-inline']
No regular expressions.
Kernel has not yet been executed.
INFO:[Thread-2] Running test /root/2018-03-22/testsuite/CUDA/always_inline/kernel.cu
INFO:[Thread-3] Running test /root/2018-03-22/testsuite/CUDA/annotation_tests/test_all/kernel.cu
DEBUG:GPUVerifyTestKernel:
Full Path:/root/2018-03-22/testsuite/CUDA/always_inline/kernel.cu
Expected exit code:SUCCESS
CmdArgs: ['--blockDim=64', '--gridDim=64', '--no-inline']
No regular expressions.
Kernel has not yet been executed.
INFO:[Thread-4] Running test /root/2018-03-22/testsuite/CUDA/annotation_tests/test_assert/kernel.cu
DEBUG:GPUVerifyTestKernel:
Full Path:/root/2018-03-22/testsuite/CUDA/annotation_tests/test_all/kernel.cu
Expected exit code:SUCCESS
CmdArgs: ['--blockDim=64', '--gridDim=64', '--no-inline']
No regular expressions.
Kernel has not yet been executed.
DEBUG:GPUVerifyTestKernel:
Full Path:/root/2018-03-22/testsuite/CUDA/annotation_tests/test_assert/kernel.cu
Expected exit code:SUCCESS
CmdArgs: ['--blockDim=64', '--gridDim=64', '--no-inline']
No regular expressions.
Kernel has not yet been executed.
INFO:[Thread-5] Running test /root/2018-03-22/testsuite/CUDA/annotation_tests/test_assume/kernel.cu
INFO:[Thread-6] Running test /root/2018-03-22/testsuite/CUDA/annotation_tests/test_at_most_one/kernel.cu
INFO:[Thread-7] Running test /root/2018-03-22/testsuite/CUDA/annotation_tests/test_axiom/kernel.cu
DEBUG:GPUVerifyTestKernel:
Full Path:/root/2018-03-22/testsuite/CUDA/annotation_tests/test_assume/kernel.cu
Expected exit code:SUCCESS
CmdArgs: ['--blockDim=64', '--gridDim=64', '--no-inline']
No regular expressions.
Kernel has not yet been executed.
INFO:[Thread-8] Running test /root/2018-03-22/testsuite/CUDA/annotation_tests/test_contract/kernel.cu
DEBUG:GPUVerifyTestKernel:
Full Path:/root/2018-03-22/testsuite/CUDA/annotation_tests/test_at_most_one/kernel.cu
Expected exit code:SUCCESS
CmdArgs: ['--blockDim=64', '--gridDim=1', '--no-inline']
No regular expressions.
Kernel has not yet been executed.
DEBUG:GPUVerifyTestKernel:
Full Path:/root/2018-03-22/testsuite/CUDA/annotation_tests/test_axiom/kernel.cu
Expected exit code:SUCCESS
CmdArgs: ['--blockDim=[64,64]', '--gridDim=64', '--no-inline']
No regular expressions.
Kernel has not yet been executed.
Any idea why this can happen?
Those kernels do not come with a regular expression to test the output of GPUVerify against. They just test for pass of fail. Nothing to worry about, this is as expected.
GPUVerify can consume lots of memory, because the problems sent to the underlying solver can be quite substantial. When 8GB is being used, it's likely the solver that is running out of memory. I would suggest running top at the same time to see what is actually consuming the memory (likely z3 or cvc4). This can, for example, happen when non-standard loop idioms are used in the kernels and no loop invariants have been specified by hand. This I can only diagnose by having access to the kernels. However, this is an issue totally different from the one we're looking at here.
In any case the above is all expected and doesn't bring us any closer to me reproducing the original problem you reported here. Note that I'm not familiar with the VMs you mention, and it's not clear how they are configured. Being able to reproduce this in a bog-standard Ubuntu installation instead of some funky VM configured for the cloud would be much preferable to me.
Thanks! First of all I'd like to get the tests running before I start continue my own kernel.
Those hosted VMs are the only machines where I can reproduce the error (tests pass on my local machine and a seconds VM provider). Anyways, there must be some kind of error, even if not so easy to reproduce. I have some ideas to debug:
__gvdump.txt
is not the stack trace of the original exception. Is that because e
is re-thrown? We could store the original stack trace before re-throwing the exception.System.IO.IOException: Write fault on path
plus [unknown]
filename): One likely explanation for this error is that the process exited before you're attempting to write to it.I assume the error also occurs when you run gvtester.py
with -j1
? If so, are you willing to recompile the C# bits of GPUVerify yourself? If so, the basic steps are:
nuget restore GPUVerify.sln
xbuild /m /p:Configuration=Release GPUVerify.sln
(you might need mono-develop for this, I'm not sure whether everything is in mono-complete)Binaries
directory over the ones you got in the release zipHopefully that gives the same error when you run the test suite. If so, open each file containing a call to DumpExceptionInformation
, and rip out the catch
block containing the call (also remove try
higher up, otherwise things won't compile. Run the xbuild
command again, copy, and rerun the test suite (pass -j1
and --gvopt=--verbose
). Post the output here if you see a failure (should now be written to the console).
I would like to improve DumpExceptionInformation
, but it has never been quite clear to me why it behaves in the way it behaves. Your guess might be correct though.
I believe I've fixed the issue with DumpExceptionInformation
in 300e0b90d11c655c02d2c6f1bbaaf2f. So instead of ripping out the catch
block performing the bullet points above and looking in __gvdump.txt
should suffice.
You put me on the right rack with the re-throw. Thanks!
Very nice! Building worked perfectly, except for the /n
parameter, that seems to be not supported by xbuild. Thanks for the getting started guide!
The error remains the same with the custom build. But now ater applying your patch, I get something much more interesting in the __gvdump.txt
s:
An exception occured:
System.IO.IOException: Write fault on path /root/2018-03-22/testsuite/CUDA/annotation_tests/test_all/[Unknown]
at System.IO.FileStream.WriteInternal (System.Byte[] src, System.Int32 offset, System.Int32 count) [0x00086] in <8f2c484307284b51944a1a13a14c0266>:0
at System.IO.FileStream.Write (System.Byte[] array, System.Int32 offset, System.Int32 count) [0x000a5] in <8f2c484307284b51944a1a13a14c0266>:0
at System.IO.StreamWriter.Flush (System.Boolean flushStream, System.Boolean flushEncoder) [0x00094] in <8f2c484307284b51944a1a13a14c0266>:0
at System.IO.StreamWriter.Write (System.Char[] buffer, System.Int32 index, System.Int32 count) [0x000ee] in <8f2c484307284b51944a1a13a14c0266>:0
at System.IO.TextWriter.WriteLine (System.String value) [0x00083] in <8f2c484307284b51944a1a13a14c0266>:0
at Microsoft.Boogie.SMTLib.SMTLibProcess.Send (System.String cmd) [0x00064] in <3bba1f071128458086b54011b55bf8a6>:0
at Microsoft.Boogie.SMTLib.SMTLibProcessTheoremProver.Send (System.String s, System.Boolean isCommon) [0x0002b] in <3bba1f071128458086b54011b55bf8a6>:0
at Microsoft.Boogie.SMTLib.SMTLibProcessTheoremProver.SendCommon (System.String s) [0x00000] in <3bba1f071128458086b54011b55bf8a6>:0
at Microsoft.Boogie.SMTLib.SMTLibProcessTheoremProver.PrepareCommon () [0x00010] in <3bba1f071128458086b54011b55bf8a6>:0
at Microsoft.Boogie.SMTLib.SMTLibProcessTheoremProver..ctor (Microsoft.Boogie.ProverOptions options, Microsoft.Boogie.VCExpressionGenerator gen, Microsoft.Boogie.SMTLib.SMTLibProverContext ctx) [0x0011e] in <3bba1f071128458086b54011b55bf8a6>:0
at Microsoft.Boogie.SMTLib.Factory.SpawnProver (Microsoft.Boogie.ProverOptions options, Microsoft.Boogie.VCExpressionGenerator gen, Microsoft.Boogie.SMTLib.SMTLibProverContext ctx) [0x00010] in <3bba1f071128458086b54011b55bf8a6>:0
at Microsoft.Boogie.SMTLib.Factory.SpawnProver (Microsoft.Boogie.ProverOptions options, System.Object ctxt) [0x0001d] in <3bba1f071128458086b54011b55bf8a6>:0
at Microsoft.Boogie.ProverInterface.CreateProver (Microsoft.Boogie.Program prog, System.String logFilePath, System.Boolean appendLogFile, System.Int32 timeout, System.Int32 taskID) [0x00194] in <3608100f74c74016a9f15d6bd81477c0>:0
at Microsoft.Boogie.Houdini.Houdini.Initialize (Microsoft.Boogie.Program program, Microsoft.Boogie.Houdini.HoudiniSession+HoudiniStatistics stats) [0x00100] in <d4dc3bfc5a674b99b6f23327f9a0cf02>:0
at Microsoft.Boogie.Houdini.ConcurrentHoudini..ctor (System.Int32 taskId, Microsoft.Boogie.Program program, Microsoft.Boogie.Houdini.HoudiniSession+HoudiniStatistics stats, System.String cexTraceFile) [0x0001c] in <d4dc3bfc5a674b99b6f23327f9a0cf02>:0
at Microsoft.Boogie.SMTEngine.Start (Microsoft.Boogie.Program program, Microsoft.Boogie.Houdini.HoudiniOutcome& outcome) [0x0013c] in <35efa05dac4843eb92817f41adfd3b44>:0
at Microsoft.Boogie.Scheduler.ScheduleEnginesInSequence (Microsoft.Boogie.Pipeline pipeline) [0x00035] in <35efa05dac4843eb92817f41adfd3b44>:0
at Microsoft.Boogie.Scheduler..ctor (System.Collections.Generic.List`1[T] fileNames) [0x00039] in <35efa05dac4843eb92817f41adfd3b44>:0
at GPUVerify.GPUVerifyCruncher.Main (System.String[] args) [0x00151] in <35efa05dac4843eb92817f41adfd3b44>:0
I checked the first 4 files and all have the exact same stack tract except for the test folder in the second line.
Ok, so GPUVerifyCruncher
spawns the solver as a child process and writes to it via a pipe, so [Unknown]
is correct.
Could you run gvtester with the following additional arguments: --stop-on-fail -j1 --gvopt=--cruncher-opt=/proverOpt:VERBOSITY=100 --gvopt=--verbose
and provide me with the output?
Sorry didn't mean to close the issue.
From the original, unmodified release zip:
$ ./gvtester.py --stop-on-fail -j1 --gvopt=--cruncher-opt=/proverOpt:VERBOSITY=100 --gvopt=--verbose --write-pickle run.pickle testsuite/
INFO:Found 470 OpenCL kernels, 166 CUDA kernels and 2 miscellaneous tests
INFO:Running 470 OpenCL kernels, 166 CUDA kernels and 2 miscellaneous tests
INFO:Using 1 threads
INFO:Running tests...
INFO:[Thread-1] Running test /root/2018-03-22/testsuite/CUDA/align/kernel.cu
ERROR:[Thread-1] /root/2018-03-22/testsuite/CUDA/align/kernel.cu FAILED with BOOGIE_INTERNAL_ERROR expected SUCCESS
GPUVerify: an internal error has occurred, details written to __gvdump.txt.
Please consult the troubleshooting guide in the GPUVerify documentation
for common problems, and if this does not help, raise an issue via the
GPUVerify issue tracker:
https://github.com/mc-imperial/gpuverify
Got 2 groups of size 2
Running clang
/root/2018-03-22/bin/clang -Wall -g -gcolumn-info -emit-llvm -c -Xclang -disable-O0-optnone -target i386-- --cuda-device-only -nocudainc -nocudalib --cuda-gpu-arch=sm_35 -x cuda -Xclang -fcuda-is-device -include cuda.h -o /root/2018-03-22/testsuite/CUDA/align/kernel.bc /root/2018-03-22/testsuite/CUDA/align/kernel.cu -I/root/2018-03-22/bugle/include-blang -D__BUGLE_32__ -D__1D_THREAD_BLOCK -D__1D_GRID -D__BLOCK_DIM_0=2 -D__GRID_DIM_0=2 -D__WARP_SIZE=32
Running opt
/root/2018-03-22/bin/opt -mem2reg -globaldce -o /root/2018-03-22/testsuite/CUDA/align/kernel.opt.bc /root/2018-03-22/testsuite/CUDA/align/kernel.bc
Running bugle
/root/2018-03-22/bin/bugle -l cu -race-instrumentation=watchdog-single -s /root/2018-03-22/testsuite/CUDA/align/kernel.loc -o /root/2018-03-22/testsuite/CUDA/align/kernel.gbpl /root/2018-03-22/testsuite/CUDA/align/kernel.opt.bc
Running gpuverifyvcgen
mono /root/2018-03-22/bin/GPUVerifyVCGen.exe /noPruneInfeasibleEdges /raceChecking:SINGLE /print:/root/2018-03-22/testsuite/CUDA/align/kernel /root/2018-03-22/testsuite/CUDA/align/kernel.gbpl
Running gpuverifycruncher
mono /root/2018-03-22/bin/GPUVerifyCruncher.exe /nologo /typeEncoding:m /mv:- /doModSetAnalysis /useArrayTheory /doNotUseLabels /enhancedErrorMessages:1 /sourceLanguage:cu /blockHighestDim:0 /gridHighestDim:0 /proverOpt:OPTIMIZE_FOR_BV=true /z3opt:smt.relevancy=0 /z3exe:/root/2018-03-22/bin/z3.exe /raceChecking:SINGLE /noinfer /contractInfer /concurrentHoudini /proverOpt:VERBOSITY=100 /root/2018-03-22/testsuite/CUDA/align/kernel.bpl
[SMT-0] Starting /root/2018-03-22/bin/z3.exe AUTO_CONFIG=false -smt2 -in smt.relevancy=0
[SMT-INP-0] (set-option :print-success false)
[SMT-ERR-0] /root/2018-03-22/bin/z3.exe: error while loading shared libraries: libgomp.so.1: cannot open shared object file: No such file or directory
Clean up handler Calling DeleteFile(('/root/2018-03-22/testsuite/CUDA/align/kernel.bc',), {})
Clean up handler Calling DeleteFile(('/root/2018-03-22/testsuite/CUDA/align/kernel.opt.bc',), {})
Clean up handler Calling DeleteFile(('/root/2018-03-22/testsuite/CUDA/align/kernel.gbpl',), {})
Clean up handler Calling DeleteFile(('/root/2018-03-22/testsuite/CUDA/align/kernel.loc',), {})
Clean up handler Calling DeleteFile(('/root/2018-03-22/testsuite/CUDA/align/kernel.cbpl',), {})
Clean up handler Calling DeleteFile(('/root/2018-03-22/testsuite/CUDA/align/kernel.bpl',), {})
Clean up handler Calling handleTiming((11,), {})
INFO:Trying to stop on first failure
INFO:Finished running tests.
################################################################################
Summary:
# of tests:638
# OpenCL kernels:470
# CUDA kernels:168
# of passes:0
# of expected failures (xfail):0
# of unexpected failures:1
# of tests skipped:637
################################################################################
INFO:Writing run information to pickle file "run.pickle"
Time taken to run tests: 1.50296592712
And after installing libgomp1
(no idea what that is), I see no more errors 🎉
The Z3 we're using was built against OpenMP (it can do some parallel stuff, which we're not using). It's looking for the GNU OpenMP library.
You might want to do the same thing, but with CVC4 as solver. In that case you're probably also missing some libraries.
$ ./gvtester.py --stop-on-fail -j1 --gvopt=--cruncher-opt=/proverOpt:VERBOSITY=100 --gvopt="--solver=cvc4" --gvopt=--verbose --write-pickle run.pickle testsuite/
runs without complains now.
In the meantime I found the type of Ubuntu installation my VM is using: Ubuntu 18.04.1 LTS (Bionic Beaver) Minimal and Nextcloud (64-Bit) . So this might be reproducible with a default minimal installation.
Ok, glad we've figured it out and everything works now. I've added a todo
to our trouble shooting documentation regarding this. I've actually ran into similar issues before with CVC4 on other people's machines, just had no idea anymore what the actual symptoms were.
Since everything works, I'll close this issue now. For any further problems you run into, please open new issues.
After resolving https://github.com/mc-imperial/gpuverify/issues/34, I now get internal errors when when running the tests (same for verifying my own OpenCL kernel):
The
__gvdump.txt
written until aborting areAll files contain
with the same addesses
0x00147
and3350e27eb9a0457a80eecaac7cdc77c4
.Since I run the tests as root, I don't see how this could be a permission error.