Closed vellamike closed 10 years ago
In
__kernel void computeInteractionWithMembranes
calls
pos_p = calculateProjectionOfPointToPlane(...
calls
denominator = calcDeterminant3x3(a_1,a_2,a_3);
which runs the math
return c1[1]*c2[2]*c3[3] + c1[2]*c2[3]*c3[1] + c1[3]*c2[1]*c3[2]
- c1[3]*c2[2]*c3[1] - c1[1]*c2[3]*c3[2] - c1[2]*c2[1]*c3[3];
and if that result is = 0 you get the error message. Taking a stab in the dark, computeInteractionWithMembranes is both reading from and writing to various parts of __global float4 *position which is the ultimate source of the data in the above calculation. I'm still trying to understand how memory synchronization works but I think we need a barrier. Try adding one in computeInteractionWithMembranes here:
barrier(CLK_GLOBAL_MEM_FENCE);
if(membrane_jd_counter>0)
{
That isolates the parts that read position from the parts that write it and should pause every thread at the barrier until all have reached it.
Great idea, unfortunately it didn't do the trick. Still getting the same error.
CORRECTION: with the barrier I actual get a segfault at the end with AMD (and the original error with Intel), here's the full stack trace:
Pyramidal simulation class loaded!
CL_PLATFORM_VERSION [0]: OpenCL 1.2 AMD-APP (1214.3)
CL_CONTEXT_PLATFORM [0]: CL_DEVICE_NAME [0]: AMD Opteron(TM) Processor 6272
CL_CONTEXT_PLATFORM [0]: CL_DEVICE_MAX_WORK_GROUP_SIZE [0]: 1024
CL_CONTEXT_PLATFORM [0]: CL_DEVICE_MAX_COMPUTE_UNITS [0]: 64
CL_CONTEXT_PLATFORM [0]: CL_DEVICE_GLOBAL_MEM_SIZE [0]: 2032017408
CL_CONTEXT_PLATFORM [0]: CL_DEVICE_GLOBAL_MEM_CACHE_SIZE [0]: 16384
CL_CONTEXT_PLATFORM [0]: CL_DEVICE_LOCAL_MEM_SIZE [0]: 32768
[[ Step 0 ]]
_runClearBuffers: 8.165 ms
_runHashParticles: 4.037 ms
_runSort: 16.651 ms
_runSortPostPass: 3.325 ms
_runIndexx: 11.989 ms
_runIndexPostPass: 1.676 ms
_runFindNeighbors: 32.445 ms
_runPCISPH: 69.302 ms 3 iteration(s)
calculateProjectionOfPointToPlane() returned errorcalculateProjectionOfPointToPlane() returned errorcalculateProjectionOfPointToPlane() returned errorSegmentation fault (core dumped)
I added the following print statements to help understand what is going on:
float4 b = (float4)(0, b_1, b_2, b_3);
denominator = calcDeterminant3x3(a_1,a_2,a_3);
printf("\na_1 = %2.2v4hlf", a_1);
printf("\na_2 = %2.2v4hlf", a_2);
printf("\na_3 = %2.2v4hlf", a_3);
printf("\ndenominator = %2.2v4hlf", denominator);
And this is the output (not always the same, this is just an example):
a_1 = 0.00,-0.25,-1.71,-1.71
a_1 = 0.00,1.23,-1.75,-0.03
a_1 = 0.00,1.64,1.71,1.22
a_1 = 0.00,-0.27,-1.80,-1.80
a_2 = 0.00,-3.00,0.16,0.18
a_3 = 0.00,-0.02,0.00,-1.67
denominator = 0.00,-0.02,0.00,-1.67calculateProjectionOfPointToPlane() returned error
a_1 = 0.00,0.27,-1.78,-1.78
a_2 = 0.00,2.98,0.16,0.18
a_3 = 0.00,-0.03,0.00,1.67
denominator = 0.00,-0.03,0.00,1.67calculateProjectionOfPointToPlane() returned error
a_1 = 0.00,2.76,-0.20,-0.10
a_2 = 0.00,0.33,1.66,0.00
a_3 = 0.00,0.17,-0.00,1.67
denominator = 0.00,0.17,-0.00,1.67calculateProjectionOfPointToPlane() returned error
a_1 = 0.00,-0.28,-1.82,-1.82
a_2 = 0.00,-3.05,0.17,0.17
a_3 = 0.00,-0.01,0.00,-1.67
denominator = 0.00,-0.01,0.00,-1.67calculateProjectionOfPointToPlane() returned error
a_1 = 0.00,1.88,1.45,1.39
a_2 = 0.00,-2.42,1.13,1.16
a_3 = 0.00,0.13,0.00,1.67
denominator = 0.00,0.13,0.00,1.67calculateProjectionOfPointToPlane() returned errorSegmentation fault (core dumped)
I'm a bit perplexed now since none of those denominators are equal to zero.
But, if I add the following:
else {
printf("\ndenominator equal to zero\n");
pm.w = -1;//indicates error
}
This is the (inconsistent) output:
a_1 = 0.00,-2.51,0.72,0.04
a_2 = 0.00,-1.21,-1.51,0.00
a_1 = 0.00,1.70,1.53,1.49
a_1 = 0.00,2.76,-0.20,-0.10
a_1 = 0.00,-0.28,-1.83,-0.00
a_2 = 0.00,3.06,-0.17,0.00
a_3 = 0.00,-0.00,0.00,1.67
denominator = 0.00,-0.00,0.00,1.67
denominator equal to zero
calculateProjectionOfPointToPlane() returned error
a_1 = 0.00,2.68,-0.46,-0.41
a_2 = 0.00,0.77,1.60,1.60
a_3 = 0.00,-0.08,0.00,1.67
a_1 = 0.00,-0.27,-1.68,-1.68
a_1 = 0.00,0.53,-1.69,-0.02
a_2 = 0.00,2.81,0.32,-0.04
a_3 = 0.00,0.08,0.00,1.67
denominator = 0.00,0.08,0.00,1.67
a_1 = 0.00,1.88,1.45,1.39
a_2 = 0.00,-2.42,1.13,1.16
a_3 = 0.00,0.13,0.00,1.67
denominator = 0.00,0.13,0.00,1.67
denominator equal to zero
calculateProjectionOfPointToPlane() returned error
a_1 = 0.00,0.31,-1.81,-1.81
a_2 = 0.00,3.02,0.18,0.24
a_3 = 0.00,-0.09,-0.00,1.67
denominator = 0.00,-0.09,-0.00,1.67
denominator equal to zero
calculateProjectionOfPointToPlane() returned error
a_1 = 0.00,1.53,-1.45,-1.42
a_2 = 0.00,2.43,0.91,0.94
a_3 = 0.00,-0.08,0.00,1.67
denominator = 0.00,-0.08,0.00,1.67
denominator equal to zero
calculateProjectionOfPointToPlane() returned errorSegmentation fault (core dumped)
Now I'm extremely confused, because all of those denominator values which were printed should satisfy the !=0
condition in the if statement.
There was a confusing error in the above print statement, the actual output should be:
CL_PLATFORM_VERSION [0]: OpenCL 1.2 AMD-APP (1214.3)
CL_CONTEXT_PLATFORM [0]: CL_DEVICE_NAME [0]: AMD Opteron(TM) Processor 6272
CL_CONTEXT_PLATFORM [0]: CL_DEVICE_MAX_WORK_GROUP_SIZE [0]: 1024
CL_CONTEXT_PLATFORM [0]: CL_DEVICE_MAX_COMPUTE_UNITS [0]: 64
CL_CONTEXT_PLATFORM [0]: CL_DEVICE_GLOBAL_MEM_SIZE [0]: 2032017408
CL_CONTEXT_PLATFORM [0]: CL_DEVICE_GLOBAL_MEM_CACHE_SIZE [0]: 16384
CL_CONTEXT_PLATFORM [0]: CL_DEVICE_LOCAL_MEM_SIZE [0]: 32768
[[ Step 0 ]]
_runClearBuffers: 5.495 ms
_runHashParticles: 3.001 ms
_runSort: 29.882 ms
_runSortPostPass: 1.959 ms
_runIndexx: 6.443 ms
_runIndexPostPass: 1.783 ms
_runFindNeighbors: 43.711 ms
_runPCISPH: 81.370 ms 3 iteration(s)
a_1 = 0.00,-0.28,-1.82,-1.82
a_2 = 0.00,-3.04,0.17,0.18
a_3 = 0.00,-0.02,0.00,-1.67
denominator = 0.000000
denominator equal to zero
calculateProjectionOfPointToPlane() returned error
a_1 = 0.00,-0.28,-1.81,-1.81
a_1 = 0.00,-0.28,-1.82,-1.82
a_2 = 0.00,-3.05,0.17,0.17
a_2 = 0.00,-3.02,0.17,0.18
a_3 = 0.00,-0.01,0.00,-1.67
a_3 = 0.00,-0.02,0.00,-1.67
denominator = 0.000000
denominator = 0.000000
denominator equal to zero
calculateProjectionOfPointToPlane() returned error
denominator equal to zero
calculateProjectionOfPointToPlane() returned error
a_1 = 0.00,-0.25,-1.71,-1.71
a_1 = 0.00,-0.27,-1.77,-0.01
a_1 = 0.00,2.76,-0.20,-0.10
a_1 = 0.00,1.69,-1.82,-1.16
a_1 = 0.00,-2.51,0.72,0.04
a_1 = 0.00,1.70,1.53,1.49
a_2 = 0.00,-2.56,1.02,1.04
a_1 = 0.00,1.88,1.45,1.39
a_1 = 0.00,1.59,-1.47,-0.11
a_1 = 0.00,-2.55,-0.68,-0.67
a_1 = 0.00,-1.51,1.41,0.19
a_2 = 0.00,-1.21,-1.51,0.00
a_3 = 0.00,0.07,0.00,1.67
denominator = 0.000000
a_2 = 0.00,0.33,1.66,0.00
a_1 = 0.00,0.26,-1.74,-1.74
denominator equal to zero
calculateProjectionOfPointToPlane() returned error
a_1 = 0.00,-2.78,-0.14,-0.03Segmentation fault (core dumped)
Which of course makes sense, for some reason the determinant calculation code is returning 0.0.
And I think it's working. I will need to check it tomorrow since I don't have a working OpenGL install on this machine, I'll report on the fix if I have indeed fixed it.
OK - so it's working now with both AMD OpenCL1.2 and Intel OpenCL1.1 for Linux! - thank @Neurophile @slarson and @skhayrulin in particular for helping me get this to work.
It required quite a few changes to sphFluid.cl to remove things which were not legal OpenCL syntax.
Some observations:
> Error!: r_ij: 0.000000
> sortedPosition[85112] : 0.000000 , 66.799995 , 835.000000
> sortedPosition[85103] : 0.000000 , 66.799995 , 835.000000
> Error!: r_ij: 0.000000
> sortedPosition[85112] : 0.000000 , 66.799995 , 835.000000
> sortedPosition[85104] : 0.000000 , 66.799995 , 835.000000
> Error!: r_ij: 0.000000
> sortedPosition[85112] : 0.000000 , 66.799995 , 835.000000
calculateProjectionOfPointToPlane()
When I run on Linux I often get the error message:
And so on for hundreds of times. Occasionally it doesn't happen, indicating some kind of race condition?
Any idea why this might be happening @a-palyanov ?