We just got a machine with P100 cards and I was going to run some long silicon carbon simulations over the summer (1 μs+) with vashishta/kk. I run every simulation one 1xP100 with 20June17 version of LAMMPS, gcc 5.4.0 and Cuda 8.0.
After a long time (1 million timesteps+, but actual number is not deterministic) the simulations crash with an error:
terminate called after throwing an instance of 'std::runtime_error'
what(): cudaCreateTextureObject( & tex_obj , & resDesc, & texDesc, NULL ) error( cudaErrorMemoryAllocation): out of memory ../../lib/kokkos/core/src/Cuda/Kokkos_CudaSpace.cpp:296
Traceback functionality not available
[bigfacet:07706] *** Process received signal ***
[bigfacet:07706] Signal: Aborted (6)
[bigfacet:07706] Signal code: (-6)
[bigfacet:07706] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0x11390)[0x7f18b685f390]
[bigfacet:07706] [ 1] /lib/x86_64-linux-gnu/libc.so.6(gsignal+0x38)[0x7f18b59f6428]
[bigfacet:07706] [ 2] /lib/x86_64-linux-gnu/libc.so.6(abort+0x16a)[0x7f18b59f802a]
[bigfacet:07706] [ 3] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(_ZN9__gnu_cxx27__verbose_terminate_handlerEv+0x16d)[0x7f18b655b84d]
[bigfacet:07706] [ 4] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0x8d6b6)[0x7f18b65596b6]
[bigfacet:07706] [ 5] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0x8d701)[0x7f18b6559701]
[bigfacet:07706] [ 6] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0x8d919)[0x7f18b6559919]
[bigfacet:07706] [ 7] lmp_kokkos_cuda_openmpi[0xf791f3]
[bigfacet:07706] [ 8] lmp_kokkos_cuda_openmpi[0xf81750]
[bigfacet:07706] [ 9] lmp_kokkos_cuda_openmpi[0xf7bf9b]
[bigfacet:07706] [10] lmp_kokkos_cuda_openmpi[0x4e3d5c]
[bigfacet:07706] [11] lmp_kokkos_cuda_openmpi[0xf23356]
[bigfacet:07706] [12] lmp_kokkos_cuda_openmpi[0xf03c9d]
[bigfacet:07706] [13] lmp_kokkos_cuda_openmpi[0x929761]
[bigfacet:07706] [14] lmp_kokkos_cuda_openmpi[0x57ce79]
[bigfacet:07706] [15] lmp_kokkos_cuda_openmpi[0x57b13f]
[bigfacet:07706] [16] lmp_kokkos_cuda_openmpi[0x57bca7]
[bigfacet:07706] [17] lmp_kokkos_cuda_openmpi[0x422106]
[bigfacet:07706] [18] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0)[0x7f18b59e1830]
[bigfacet:07706] [19] lmp_kokkos_cuda_openmpi[0x425b69]
[bigfacet:07706] *** End of error message ***
I suspected this could be the vashishta implementation, so I tried to run the sw benchmark (attached modified version) which also crashed after 21 million timesteps.
I'm not sure how to debug this, but since it crashes during cudaCreateTextureObject I suspect this is during short neighbor list allocation since it uses textures due to its random access pattern (or regular neighbor list).
I then noticed that we actually reallocate the short neighbor list every timestep nlocal+nghosts changes (which is quite often, after every neighbor list build I suppose), which probably isn't needed. I'm currently running a simulation where I only reallocate short neighbor list when it is smaller than what is needed. If this does not crash I'm closer to understanding.
nvidia-smi does not show any noticable increased memory usage during the simulation. It could be a CUDA bug where heavy reallocation somehow fragments memory?
We just got a machine with P100 cards and I was going to run some long silicon carbon simulations over the summer (1 μs+) with
vashishta/kk
. I run every simulation one 1xP100 with 20June17 version of LAMMPS, gcc 5.4.0 and Cuda 8.0.After a long time (1 million timesteps+, but actual number is not deterministic) the simulations crash with an error:
I suspected this could be the vashishta implementation, so I tried to run the sw benchmark (attached modified version) which also crashed after 21 million timesteps.
I'm not sure how to debug this, but since it crashes during
cudaCreateTextureObject
I suspect this is during short neighbor list allocation since it uses textures due to its random access pattern (or regular neighbor list).I then noticed that we actually reallocate the short neighbor list every timestep
nlocal+nghosts
changes (which is quite often, after every neighbor list build I suppose), which probably isn't needed. I'm currently running a simulation where I only reallocate short neighbor list when it is smaller than what is needed. If this does not crash I'm closer to understanding.nvidia-smi
does not show any noticable increased memory usage during the simulation. It could be a CUDA bug where heavy reallocation somehow fragments memory?Any ideas?
kokkos_memory_bug.zip