utwente-fmt / vercors

The VerCors verification toolset for verifying parallel and concurrent software
https://utwente.nl/vercors
Mozilla Public License 2.0
51 stars 24 forks source link

GPU programs do not give fault for to much permission, which leads to incomplete verification #1184

Open sakehl opened 3 months ago

sakehl commented 3 months ago

So this file:

//:: cases BasicCuda
//:: tool silicon
//:: verdict Pass

#include <cuda.h>

/*@
    context \pointer_index(a, threadIdx.x, write);
@*/
__global__ void example(int a[], int len) {
    int tid = threadIdx.x;
    a[tid] = tid;
    /*@
        context \pointer_index(a, threadIdx.x, write);
    @*/
    __syncthreads();
    a[tid] = a[tid] * 2;
}

from examples/concepts/gpgpu/cuda.cu

is actually wrong. Because we threadIdx.x is not unique, each block in x, y and z direction all have write permission.

(so we have gridDim.x*gridDim.y*gridDim.z*blockDim.y*blockDim.z*writepermission actually)

But the method that checks if the pre-condition is satisfiable does not work here:

method example(blockdimX: Int, blockdimY: Int, blockdimZ: Int, griddimX: Int,
  griddimY: Int, griddimZ: Int, a: Option[Pointer])
  requires 0 < blockdimX
  requires blockdimY == 1
  requires blockdimZ == 1
  requires griddimX == 2
  requires griddimY == 1
  requires griddimZ == 1
  requires 0 < blockdimX
  requires 0 < blockdimY
  requires 0 < blockdimZ
  requires 0 < griddimX
  requires 0 < griddimY
  requires 0 < griddimZ
  requires a !=
    ((none1(): Option[Nothing]) == (none1(): Option[Nothing]) ?
      (none1(): Option[Pointer]) :
      (some(nothingAs1(optGet1((none1(): Option[Nothing])))): Option[Pointer]))
  requires blockdimX <
    block_length(pointer_block(optGet2(a))) - pointer_offset(optGet2(a))
  requires (forall threadidxx: Int ::
      { ptrDeref(ptrAdd(optGet2(a), threadidxx)).int }
      0 <= threadidxx && threadidxx < blockdimX ==>
      acc(ptrDeref(optGet2((some(ptrAdd(optGet2(a), threadidxx)): Option[Pointer]))).int, scale1(griddimZ *
      write) *
      scale1(griddimY * write) *
      scale1(griddimX * write) *
      scale1(blockdimZ * write) *
      scale1(blockdimY * write) *
      write))
{scale1
scale1
scale1
scale1
scale1
  assert false
}

if we remove the scale1 function from the write, it actually does realize this is not satisfiable.

This is the correct version:

//:: cases BasicCuda
//:: tool silicon
//:: verdict Pass

#include <cuda.h>

/*@
    context_everywhere blockDim.x > 0 && blockDim.y == 1 && blockDim.z == 1;
    context_everywhere gridDim.x == 1 && gridDim.y == 1 && gridDim.z == 1;
    context a != NULL && \pointer_length(a)>blockDim.x;
    context Perm(&a[threadIdx.x], write);
@*/
__global__ void example(int a[], int len) {
    int tid = threadIdx.x;
    a[tid] = tid;
    /*@
        context Perm(&a[threadIdx.x], write);
    @*/
    __syncthreads();
    a[tid] = a[tid] * 2;
}`

Already discussed this with @pieter-bos before, but scale breaks more things it seems.

pieter-bos commented 2 months ago

I've always imagined that this was implicit in the contract. Like you say, this contract is not satisfiable unless all dimensions other than blockDim.x are 1. I'm curious why there is an == 2 in the satcheck and why the assert doesn't error, maybe the first explains the second.

sakehl commented 2 months ago

I see I never responded to this.

I'm pretty sure the 'griddimX == 2' is from a viper file that comes from a different cuda file, where I made the dimensions more specific, but is still wrong. Thus the == 2 is there from a different reason.

So this version corresponds with the viper code:

//:: cases BasicCuda
//:: tool silicon
//:: verdict Pass

#include <cuda.h>

/*@
    context_everywhere blockDim.x > 0 && blockDim.y == 1 && blockDim.z == 1;
    context_everywhere gridDim.x == 2 && gridDim.y == 1 && gridDim.z == 1;
    context a != NULL && \pointer_length(a)>blockDim.x;
    context Perm(&a[threadIdx.x], write);
@*/
__global__ void example(int a[], int len) {
    int tid = threadIdx.x;
    a[tid] = tid;
    /*@
        context Perm(&a[threadIdx.x], write);
    @*/
    __syncthreads();
    a[tid] = a[tid] * 2;
}`