ornladios / ADIOS2

Next generation of ADIOS developed in the Exascale Computing Program
https://adios2.readthedocs.io/en/latest/index.html
Apache License 2.0
273 stars 127 forks source link

Python Out of Mempory when using BP4 engine #3400

Open Haoju-Leng opened 1 year ago

Haoju-Leng commented 1 year ago

I wrote a Python code that can be used to store a series of images data as one bp file. Each image is stored as a variable in the bp file. The code works fine when I used several image to test. However, when I try to use 20000 images to test, the program would run out of memory and quit. It seems all data are stored in memory at the same time until the programs defined all variables to write. Is there any parameters I can set to let adios write part of variables of data to disk before defining all variables? Thanks!

The code is below:

data_dir = '/home/lengh2/Desktop/Haoju_Leng/IO_test/test_output'

sections = glob.glob(os.path.join(data_dir, '*'))
sections.sort()

# MPI
comm = MPI.COMM_WORLD
rank = comm.Get_rank()
size = comm.Get_size()

# ADIOS portion
adios = adios2.ADIOS(comm)
ioWrite = adios.DeclareIO("ioWriter")

ioWrite.SetEngine('BP4')
bpIOParams = {}
#bpIOParams['Threads'] = '2'
#bpIOParams['ProfileUnits'] = 'Microseconds'
bpIOParams['NumAggregators'] = '4'
bpIOParams['InitialBufferSize'] = '20Mb'
bpIOParams['MaxBufferSize'] = '20Mb'
ioWrite.SetParameters(bpIOParams)
obpStream = ioWrite.Open('clinical_patches/npArray.bp', adios2.Mode.Write)
print(rank)

# for each image
for si in range(len(sections)):
    name = os.path.basename(sections[si])
    img = plt.imread(sections[si])[:, :, :3]

    cnt = 0
    # using for-loop to get small patches
    for x in range(0, img.shape[0], 512):
        for y in range(0, img.shape[1], 512):
            obpStream.BeginStep()
            var_name = str(cnt) + '_' +name
            #arr = np.load(filename, allow_pickle=True)

            arr = np.array(img[x:x + 512, y:y + 512, :])

            rowDecomp = arr.shape[0] / size
            if arr.shape[0] % size > 0:
                lastRowDecomp = rowDecomp + arr.shape[0] % size
            else:
                lastRowDecomp = rowDecomp

            if rank == size -1:
                buffer = arr[int(rank * rowDecomp):int(rank * rowDecomp + lastRowDecomp), :, :]
                if ioWrite.InquireVariable(var_name):
                    var = ioWrite.InquireVariable(var_name)
                else:
                    var = ioWrite.DefineVariable(var_name, arr, arr.shape, [int(rank * rowDecomp), 0, 0],
                                             [int(lastRowDecomp), arr.shape[1], 3], adios2.ConstantDims)
            else:

                buffer = arr[int(rank * rowDecomp):int(rank * rowDecomp + rowDecomp), :, :]
                if ioWrite.InquireVariable(var_name):
                    var = ioWrite.InquireVariable(var_name)
                else:
                    var = ioWrite.DefineVariable(var_name, arr, arr.shape, [int(rank * rowDecomp), 0, 0],
                                             [int(rowDecomp), arr.shape[1], 3], adios2.ConstantDims)

            obpStream.Put(var, buffer, adios2.Mode.Sync)
            obpStream.EndStep()
            cnt += 1
obpStream.Close()
eisenhauer commented 1 year ago

One of the properties of the BP4 engine is that one way or another it holds all data for a timestep in memory until EndStep(). However the BP5 engine supports PerformDataWrite(), which causes pending Put() operations to actually transfer data to disk freeing up internal resources. If you can try BP5, that might offer a way around this problem.

However, your code seems to show writing only a single image before EndStep(). If that's what's happening, then you've got a different problem and PerformDataWrite won't help. That you're generating a variable name on each timestep is a little concerning. As an HPC I/O system, ADIOS is more optimized around the approach of defining some set of variables once and then outputing them on different iterations (think timestep-based scientific simulations). On some engines, generating unique variable names for each timestep might have unpleasant consequences, just because it's not the approach taken by the applications which we typically target. But it's hard to tell if that's necessary without knowing some more about what you're trying to accomplish.

Haoju-Leng commented 1 year ago

@eisenhauer Thanks for your answers! Is there a limitation on how many variables we could define in a single bp file (using bp4)? Is this the possible reason for why the program crash? I try to store some images data into a single bp file, and each variable stores one image data.

eisenhauer commented 1 year ago

There are no explicit limits for number of variables in a file, but using a lot of variables once, rather than a few variables many times, will be much less memory efficient. define_variable() in the writer creates an internal data structure that is designed to be reused over many timesteps (because ADIOS applications are typically iterative). However, in your code you're only using each variable once and from then on that internal data structure sits unused. Because ADIOS is designed around reuse like this, there actually isn't an "undefine_variable()" or any other way to release that memory. I wouldn't think that 20K variables would be enough to run out of memory though...