kokkos / kokkos-tools

Kokkos C++ Performance Portability Programming Ecosystem: Profiling and Debugging Tools
Other
112 stars 59 forks source link

Fix memory HWM to also show how much memory the profiler is taking #59

Open bathmatt opened 5 years ago

bathmatt commented 5 years ago

I've been tracking down a memory issue and found that memory usage tool takes most of the memory for a real run. It would be nice to track that and output it, or remove it from the RSS.

nmhamster commented 5 years ago

Sounds like you have many small allocations? We can look at that.

bathmatt commented 5 years ago

Yes, I use trilinos. Most are in that lib. Tons of these in tpetra, muelu ifpack

871.121941   0x2aaaf6aef500              8             Host Allocate   DualView::modified_flags
871.121956   0x2aaaf6aef640              8             Host Allocate   DualView::modified_flags
871.121970   0x2aaaf6aef780              8             Host Allocate   DualView::modified_flags
871.121986   0x2aaaf407dbc0             -8             Host DeAllocate DualView::modified_flags
871.121993   0x2aaaf407e3c0             -8             Host DeAllocate DualView::modified_flags
871.122003   0x2aaaf3b52500             -8             Host DeAllocate DualView::modified_flags

In the last 100k lines of the log there are 13k deallocs and 11k of them are dual view allocations, mostly modified flags

[mbetten@serrano-login3 Bdot]$ tail -100000 ./TestResults.CTS1_MemEvent/BDot.Pressure=0.01.mpi_ranks_per_socket=1.nnodes=8.np=288.refine=0.0.use_np=256/ser7-255931.mem_events |grep DeAll | wc
  13118   78920 1255978
[mbetten@serrano-login3 Bdot]$ tail -100000 ./TestResults.CTS1_MemEvent/BDot.Pressure=0.01.mpi_ranks_per_socket=1.nnodes=8.np=288.refine=0.0.use_np=256/ser7-255931.mem_events |grep DeAll | grep DualView |wc
  11331   67998 1081020
bathmatt commented 5 years ago

And the bulk of what's left is

Host DeAllocate MV::normImpl lcl
bathmatt commented 5 years ago

Realized that dealloc is the wrong thing to look at since it is at the end of the run, looking at allocation.

[mbetten@serrano-login3 Bdot]$ tail -100000 ./TestResults.CTS1_MemEvent/BDot.Pressure=0.01.mpi_ranks_per_socket=1.nnodes=8.np=288.refine=0.0.use_np=256/ser7-255931.mem_events |grep \ Allo | wc
  10434   62706  990837
[mbetten@serrano-login3 Bdot]$ tail -100000 ./TestResults.CTS1_MemEvent/BDot.Pressure=0.01.mpi_ranks_per_socket=1.nnodes=8.np=288.refine=0.0.use_np=256/ser7-255931.mem_events |grep \ Allo | grep modified_f |wc
   9505   57030  912480
[mbetten@serrano-login3 Bdot]$ tail -100000 ./TestResults.CTS1_MemEvent/BDot.Pressure=0.01.mpi_ranks_per_socket=1.nnodes=8.np=288.refine=0.0.use_np=256/ser7-255931.mem_events |grep \ Allo | grep MV |wc
    826    5074   69856
stanmoore1 commented 5 years ago

Possible duplicate of #9.