lattice / quda

QUDA is a library for performing calculations in lattice QCD on GPUs.
https://lattice.github.io/quda
Other
289 stars 97 forks source link

hisq_paths_force_test --gauge-order milc crashes with Segmentation fault #163

Closed mathiaswagner closed 9 years ago

mathiaswagner commented 9 years ago

I just wanted to see whether the issues in #158 can be reproduced with the quad tests and noted for a single GPU QUDA build:

[mwagner@cream tests]$ ./hisq_paths_force_test --gauge-order milc --prec double
running the following fermion force computation test:
link_precision           link_reconstruct           space_dim(x/y/z)         T_dimension       Gauge_order
double                       18                         24/24/24                  24                milc
[...]
Using device 0: Tesla K40c
[...]
Segmentation fault

The same thing happens for single precision.

maddyscientist commented 9 years ago

Quick test: can you run this with valgrind / gdb to locate where the crash is occurring?


This email message is for the sole use of the intended recipient(s) and may contain confidential information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by

reply email and destroy all copies of the original message.

mathiaswagner commented 9 years ago

gdb output so far

0x00002aaaab02eea9 in ?? () from /usr/lib64/libcuda.so
(gdb) bt
#0  0x00002aaaab02eea9 in ?? () from /usr/lib64/libcuda.so
#1  0x00002aaaaaad1c82 in ?? () from /home/mwagner/cuda-6.0/lib64/libcudart.so.6.0
#2  0x00002aaaaaac4e1e in ?? () from /home/mwagner/cuda-6.0/lib64/libcudart.so.6.0
#3  0x00002aaaaaab9ee8 in ?? () from /home/mwagner/cuda-6.0/lib64/libcudart.so.6.0
#4  0x00002aaaaaae354c in cudaMalloc () from /home/mwagner/cuda-6.0/lib64/libcudart.so.6.0
#5  0x0000000001684b81 in quda::device_malloc_ (func=0x17528e0 "cudaGaugeField", file=0x17523ef "cuda_gauge_field.cu", line=42, size=99532800)
    at malloc.cpp:153
#6  0x0000000000450349 in quda::cudaGaugeField::cudaGaugeField (this=0x1a2bbc60, param=...) at cuda_gauge_field.cu:42
#7  0x0000000000405953 in hisq_force_init () at hisq_paths_force_test.cpp:362
#8  0x0000000000406055 in hisq_force_test () at hisq_paths_force_test.cpp:553
#9  0x00000000004069e0 in main (argc=3, argv=0x7fffffffe7e8) at hisq_paths_force_test.cpp:764
maddyscientist commented 9 years ago

The bug only happens when milc ordering is used, it seems to be fine using qdp gauge field ordering.

maddyscientist commented 9 years ago

This is a trivial bug - the cpu field ordering is hard-coded to be qdp, and so when you set a different field order from the command line there is a mismatch.