Closed mathiaswagner closed 9 years ago
Quick test: can you run this with valgrind / gdb to locate where the crash is occurring?
This email message is for the sole use of the intended recipient(s) and may contain confidential information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by
gdb output so far
0x00002aaaab02eea9 in ?? () from /usr/lib64/libcuda.so (gdb) bt #0 0x00002aaaab02eea9 in ?? () from /usr/lib64/libcuda.so #1 0x00002aaaaaad1c82 in ?? () from /home/mwagner/cuda-6.0/lib64/libcudart.so.6.0 #2 0x00002aaaaaac4e1e in ?? () from /home/mwagner/cuda-6.0/lib64/libcudart.so.6.0 #3 0x00002aaaaaab9ee8 in ?? () from /home/mwagner/cuda-6.0/lib64/libcudart.so.6.0 #4 0x00002aaaaaae354c in cudaMalloc () from /home/mwagner/cuda-6.0/lib64/libcudart.so.6.0 #5 0x0000000001684b81 in quda::device_malloc_ (func=0x17528e0 "cudaGaugeField", file=0x17523ef "cuda_gauge_field.cu", line=42, size=99532800) at malloc.cpp:153 #6 0x0000000000450349 in quda::cudaGaugeField::cudaGaugeField (this=0x1a2bbc60, param=...) at cuda_gauge_field.cu:42 #7 0x0000000000405953 in hisq_force_init () at hisq_paths_force_test.cpp:362 #8 0x0000000000406055 in hisq_force_test () at hisq_paths_force_test.cpp:553 #9 0x00000000004069e0 in main (argc=3, argv=0x7fffffffe7e8) at hisq_paths_force_test.cpp:764
The bug only happens when milc ordering is used, it seems to be fine using qdp gauge field ordering.
This is a trivial bug - the cpu field ordering is hard-coded to be qdp, and so when you set a different field order from the command line there is a mismatch.
I just wanted to see whether the issues in #158 can be reproduced with the quad tests and noted for a single GPU QUDA build:
The same thing happens for single precision.