yt-project / yt

Main yt repository
http://yt-project.org
Other
469 stars 278 forks source link

`covering_grid` consumes too much memory #3958

Open cindytsai opened 2 years ago

cindytsai commented 2 years ago

covering_grid consumes too much memory

Summary

covering_grid takes unreasonable large amount of memory to produce a uniform grid.

Code for reproduction

Download Data_000000

curl -JO http://use.yt/upload/1a4e8be7     # Download Data_000000

Put Data_000000 and the script below in the same folder.

import yt
import numpy as np

lv = 0
dim_x = 512
dim_y = 512
dim_z = 512
box_range = 1.58230113e-01
lv_max_resolution = 0.000309043189453125

center = [ box_range/2., box_range/2., box_range/2. ]

left_edge_shift_x = -dim_x/2.
left_edge_shift_y = -dim_y/2.
left_edge_shift_z = -dim_z/2.

def extract_data( data ):
    ds = yt.load( data )
    ad = ds.covering_grid( level=lv, 
                           left_edge=[ center[0] + left_edge_shift_x*lv_max_resolution,\
                                       center[1] + left_edge_shift_y*lv_max_resolution,\
                                       center[2] + left_edge_shift_z*lv_max_resolution ],\
                           dims=[dim_x, dim_y, dim_z] )
    real_part = ad["Real"]
    real_part.tofile("covering_grid.raw")

if __name__ == "__main__":
    extract_data("Data_000000")

Actual outcome

Using valgrind massif tool to check the memory consumption of this script. It takes 6 GB in total. npy_alloc_cache takes 5 GB of memory, which is unexpected.

export PYTHONMALLOC=malloc
valgrind -v --tool=massif --time-unit=B --detailed-freq=1 python3.8 issue.py

issueReport

Expected outcome

It shouldn't consume that much memory. Because the dataset contains only uniform grids without any AMR structure (grids are all at level 0), and the output of covering_grid dimension is the same as the dimension of data domain, covering_grid only needs to read data from file and then fill in to the new allocated buffer (512 x 512 x 512 x sizeof(double) ~ 1 GB).

Version Information

matthewturk commented 2 years ago

Oh my goodness, thank you for this detailed bug report. I will look at it. Off the topic of my head, I think it's likely doing one of the following, many or all of which could be short-circuited with improvements to yt's understanding of the data:

Let me look further, and this is a very helpful bug report.

cindytsai commented 2 years ago

Thanks for the reply! There is another minor issue about how covering_grid works in parallel when using MPI. I'm not sure if this should be an independent issue, because I think they might somewhat be related. I'll just put it here.

Even though each MPI rank is only responsible for some part of the output uniform grid, it asks grid data resides in this whole target output region. Which means MPI rank will ask for some grid data that will never be in use. I think this is caused by yt identifies which grid intersects the region and other index array as a whole, but not according to each MPI rank's responsible region. Maybe it will be better if this line gets the proper grids for each MPI rank, not just chunking by io.

matthewturk commented 2 years ago

@cindytsai and I dug into this a bit and found this from memray, which makes sense:

0x7fbf1f460740  2.9 GiB malloc  1   icoords at /home/mjturk/yt/yt/yt/geometry/geometry_handler.py:339
0x7fbf1f460740  1.0 GiB calloc  1   <listcomp> at /home/mjturk/yt/yt/yt/data_objects/construction_data_containers.py:1074
0x7fbf1f460740  992.0 MiB   malloc  1   _read_fluid_selection at /home/mjturk/yt/yt/yt/frontends/gamer/io.py:102
0x7fbf1f460740  992.0 MiB   malloc  1   ires at /home/mjturk/yt/yt/yt/geometry/geometry_handler.py:376

The basic problem is that the fundamental unit that yt is considering when generating coordinates is the grid, and the root grid is 512^3. So it will always generate the icoords for that root grid.

neutrinoceros commented 2 years ago

@matthewturk do you think we could, in principle, save memory (and time) by storing icoords in a functional form rather than brute arrays ?

matthewturk commented 2 years ago

So, I think so. I'm not entirely sure the exact best approach. There are three big ones I see:

  1. Automatically subdivide large grids into small ones at an object level. This would be the one that I'd try if I were limited in ambition, but it's also the most likely to present immediate results -- even if those results would prevent us from doing other things in the future.
  2. Use a functional form, like you are suggesting. I think this might end up being the fiddliest, but also potentially the best. The downsides to this would be that there are probably a lot of places we anticipate having actual icoords, and it might be leaky to make this work with all of them.
  3. Convert the grid frontends into exclusively non-pre-allocated frontends, like we did with particles. This might end up being the best, long-term.

I think there's a fourth option ... it's on the tip of my tongue but I can't come up with it. I will think.