mmbell / samurai

Spline Analysis at Mesoscale Utilizing Radar and Aircraft Instrumentation
GNU General Public License v3.0
13 stars 16 forks source link

Bug in Casper_a100_submit.sh #55

Closed markonders closed 1 month ago

markonders commented 2 months ago

Error in Running Casper a100 script.

Error Message :

GPU run; using 1 thread and 1 GPU ./ncar_run.sh: line 38: 207981 Illegal instruction (core dumped) ${EXE} -params $*

Tried Interactive Session and were able to confirm these parameters were available to use. Used this command: execcasper -l walltime=00:30:00 -l gpu_type=a100 -l select=1:ncpus=36:ompthreads=1:mem=700GB:ngpus=1 -A NEOL0013

Can confirm run folder is created, but there is no run/timing or run/samurai logs because the program is not running.

sjsprecious commented 2 months ago

Thanks @markonders for reporting this issue. I will take a look at it.

sjsprecious commented 2 months ago

I could confirm that I was able to reproduce the same error reported here on Casper's A100 GPU. However, I was able to run the same code on Casper's V100 GPU.

Since I was able to run SAMURAI on Derecho's A100 GPU, I think the issue here is specific to Casper's A100 GPU.

cenamiller commented 2 months ago

John and Jian can run on Casper V100 and Derecho A100, still having issues with Casper A100. Suspect system issue, Jian reported to CSG

sjsprecious commented 1 month ago

I just checked out Mark's fork (https://github.com/markonders/samurai/tree/main, commit: 7d4a308) and was able to run the Beltrami test on Casper's A100 now. Thanks @markonders for fixing this issue.