moyix / fpsmt_gpu

Solving floating point SMT constraints on a GPU
47 stars 2 forks source link

Solution found with all-zero vector #1

Open moyix opened 3 years ago

moyix commented 3 years ago

On some SMT files, the fuzzer reports success with an all zero input vector. This may be correct (haven't checked), but it should be astronomically unlikely that we ever see an all-zero input with a large input. The fuzzer should be initializing each block to i and then AES-encrypting it with a random key before ever running LLVMFuzzerTestOneInput, which should (almost) never produce all zeroes. So... what's going on here?

Test file: https://clc-gitlab.cs.uiowa.edu:2443/SMT-LIB-benchmarks/QF_FP/-/blob/master/griggio/fmcad12/div.c.30.smt2

Output:

moyix@isabella:~/git/fpsmt$ bin/smt-div-c-30-smt2 
DEV[0] Executing: cudaGetDeviceCount(&NUM_GPU)
Padding varsize from 132 to 144
DEV[0] Executing: cudaMalloc(&drkey, 176)
DEV[0] Executing: cudaMemcpy((uint8_t *)drkey, rkey, sizeof(uint8_t) * 176, cudaMemcpyHostToDevice)
DEV[0] Executing: cudaMalloc(&gbuf, padded*N*M)
DEV[0] Executing: cudaMalloc(&gobuf, sizeof(uint64_t))
DEV[0] Executing: cudaMalloc(&gexecs, sizeof(unsigned long long))
DEV[0] Executing: cudaStreamCreate(&stream)
Launching kernel on GPU0...
DEV[0] Executing: cudaLaunchHostFunc(stream, finishedCB, dev)
Padding varsize from 132 to 144
DEV[1] Executing: cudaMalloc(&drkey, 176)
DEV[1] Executing: cudaMemcpy((uint8_t *)drkey, rkey, sizeof(uint8_t) * 176, cudaMemcpyHostToDevice)
DEV[1] Executing: cudaMalloc(&gbuf, padded*N*M)
DEV[1] Executing: cudaMalloc(&gobuf, sizeof(uint64_t))
DEV[1] Executing: cudaMalloc(&gexecs, sizeof(unsigned long long))
DEV[1] Executing: cudaStreamCreate(&stream)
Launching kernel on GPU1...
DEV[1] Executing: cudaLaunchHostFunc(stream, finishedCB, dev)
Waiting on GPUs...
Search completed on device 0
DEV[0] Executing: cudaMemcpy(&hexecs, goexecs[i], sizeof(unsigned long long), cudaMemcpyDeviceToHost)
Did 0 execs in 0.380715 seconds, 0.000000 execs/s
DEV[0] Executing: cudaMemcpy(&oindex, gobuf[i], sizeof(uint64_t), cudaMemcpyDeviceToHost)
DEV[0] Executing: cudaMemcpy(buf, gbuf[i]+(oindex*padded), padded, cudaMemcpyDeviceToHost)
Found a satisfying assignment on device 0 thread 0:
000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
moyix commented 3 years ago

Turns out this was because we weren't checking for kernel launch failures and kernel launch was failing with too many resources requested for launch. Reducing the number of threads per block fixes this example, but how can we detect the right number of threads dynamically?