For same time step size GPU explicit solver diverges but CPU version does not

luchete80 commented 2 years ago

Same thing with finer mesh

luchete80 commented 2 years ago

For 1320 particles

Max disp 2.66227e-05, 2.66227e-05, 0 Step 81, Min step size 7.27353e-07 --------------------------- END STEP, Time8.2e-05, -------------------------- Step 82, Min step size 9.34472e-07 --------------------------- END STEP, Time8.3e-05, -------------------------- Step 83, Min step size 9.69382e-07 --------------------------- END STEP, Time8.4e-05, -------------------------- Step 84, Min step size 9.3563e-07 --------------------------- END STEP, Time8.5e-05, -------------------------- Step 85, Min step size 8.96999e-07 --------------------------- END STEP, Time8.6e-05, -------------------------- Step 86, Min step size 9.88032e-07 --------------------------- END STEP, Time8.7e-05, -------------------------- Step 87, Min step size 9.53742e-07 --------------------------- END STEP, Time8.8e-05, -------------------------- Step 88, Min step size 4.8974e-07 --------------------------- END STEP, Time8.9e-05, -------------------------- Step 89, Min step size 9.74918e-07 --------------------------- END STEP, Time9e-05, -------------------------- Step 90, Min step size 9.15505e-07 --------------------------- END STEP, Time9.1e-05, -------------------------- Time 9.1e-05, GPU time 0.239

luchete80 commented 2 years ago

https://stackoverflow.com/questions/14406364/different-results-for-cuda-addition-on-host-and-on-gpu

https://forums.developer.nvidia.com/t/cpu-and-gpu-floating-point-anomaly/31100

https://developer.download.nvidia.com/assets/cuda/files/NVIDIA-CUDA-Floating-Point.pdf

The nvcc compiler switch, --fmad (short name: -fmad), to control the contraction of
floating-point multiplies and add/subtracts into floating-point multiply-add
operations (FMAD, FFMA, or DFMA) has been added: --fmad=true and --fmad=false enables and disables the contraction respectively. This switch is supported only when the --gpu-architecture option is set with
compute_20, sm_20, or higher. For other architecture classes, the contraction is
always enabled. The --use_fast_math option implies --fmad=true, and enables the contraction.

luchete80 commented 2 years ago

New KickDrift Algorithm: GPU step size 0.2:

Time 0.0101001, GPU time 221.382 Current time step: 7.18303e-07 Forces calc: 0 Stresses calc: 2.733 Max disp: 0.0147366, 0.0147429, 0.00209899 Max pl_strain: 0.262628 Total steps: 14061, time spent 221.388000 Program ended.

luchete80 commented 2 years ago

Diverging with Step Size with CFL 0.4 (in CPU converges)

luchete80 commented 2 years ago

In new compilation, it works:

Max pl_strain: 0.0844634

Time 0.00500011, GPU time 105.279 Current time step: 7.18303e-07 Forces calc: 0 Stresses calc: 1.36 Max disp: 0.0048783, 0.00488042, 0.000351309 Max pl_strain: 0.0880185

luchete80 commented 2 years ago

Maybe is with something related to flag mentioned early, now is working on CFL 0.4:

Time 0.0100002, GPU time 218.864 Current time step: 7.18303e-07 Forces calc: 0 Stresses calc: 2.897 Max disp: 0.0145451, 0.0145513, 0.00206436 Max pl_strain: 0.259194

Time 0.0101001, GPU time 221.082 Current time step: 7.18303e-07 Forces calc: 0 Stresses calc: 2.929 Max disp: 0.0147366, 0.0147429, 0.00209899 Max pl_strain: 0.262628 Total steps: 14061, time spent 221.086000 Program ended.

luchete80 commented 2 years ago

ERROR

luchete80 / WeldFormGPU

For same time step size GPU explicit solver diverges but CPU version does not #77

Max pl_strain: 0.0844634

Time 0.0100002, GPU time 218.864 Current time step: 7.18303e-07 Forces calc: 0 Stresses calc: 2.897 Max disp: 0.0145451, 0.0145513, 0.00206436 Max pl_strain: 0.259194