openPMD / openPMD-api

:floppy_disk: C++ & Python API for Scientific I/O
https://openpmd-api.readthedocs.io
GNU Lesser General Public License v3.0
142 stars 51 forks source link

Get Extents in Python #1443

Closed stevenrbrandt closed 1 year ago

stevenrbrandt commented 1 year ago

What do you want to achieve, please describe.

I have a dataset with mesh refinement. It's just a 2D wave equation test. The grid goes from 0 to 40 in both x and y.

Even though the refined part of the grid (lev01) is a small box around (x,y)=(20,20), the offset and spacing in the chunk describe the entire grid from 0 to 40 in x and y. The valid data (the small box) is clearly visible in the middle of the screen, and I see random noise everywhere else. I'd like to get the data from just the small box. Thanks.

Python:

import openpmd_api
...
            if gf not in [f"{thorn}_{gf_name}_lev01"]: 
                continue
            uu = iters.meshes[gf]
            print("uu:",dir(iters.meshes))
            data = uu[f"{thorn}_{gf_name}"].load_chunk()
            data_index = data.shape[0]//2
            series.flush()
            x_index = None
            y_index = None
            for ii in range(len(uu.axis_labels)):
                if uu.axis_labels[ii] == 'x':
                    x_index = ii
                elif uu.axis_labels[ii] == 'y':
                    y_index = ii
            nx = data[data_index].shape[0]
            ny = data[data_index].shape[1]
            dx = uu.grid_spacing[x_index]
            dy = uu.grid_spacing[y_index]
            x0 = uu.grid_global_offset[x_index]
            y0 = uu.grid_global_offset[y_index]
            xv = np.linspace(x0,(nx+1)*dx+x0,nx)
            yv = np.linspace(y0,(ny+1)*dy+y0,ny)
            assert xv.shape[0] == nx
            assert yv.shape[0] == ny
            x = np.zeros(data[data_index].shape)
            y = np.zeros(data[data_index].shape)
            print("geom:",nx,x0,dx,ny,y0,dy)
            for i in range(xv.shape[0]):
                x[i,:] = xv[i]
            for j in range(yv.shape[0]):
                y[:,j] = yv[j]
            plt.pcolor(x,y,data[data_index],vmin=vmin,vmax=vmax)
# ...

or C++

#include <openPMD/openPMD.hpp>

// ...

You can also add images such as screenshots :-)

Software Environment: Have you already installed openPMD-api? yes If so, please tell us which version of openPMD-api your question is about:

franzpoeschel commented 1 year ago

Hey @stevenrbrandt, do I understand you correctly that you intend not to load the entire dataset with load_chunk(), but just a small part of it?

For this, load_chunk takes two optional parameters, the offset and the extent of the specific block to load, e.g. data = uu[f"{thorn}_{gf_name}"].load_chunk([64,64], [32,32]) for loading a block of size [32, 32] starting at [64, 64].

The examples in our documentation might also help, in there you can find a description of how to use Python slices as an alternative to load_chunk():

    chunk_data = E_x[1:3, 1:3, 1:2]
    # print("Queued the loading of a single chunk from disk, "
    #       "ready to execute")
    series.flush()

For more detailed questions on mesh refinement, I will have to defer to @ax3l

stevenrbrandt commented 1 year ago

@franzpoeschel thanks, that gets me halfway there. Now how do I find the portion of the dataset that's actual data? I mean, I can see it after it's graphed, but I'd like to get it programmatically.

franzpoeschel commented 1 year ago

Do I assume correctly that you are working with .bp files? (They are from the ADIOS2 backend which is the only one to properly support sparsely written datasets).

If yes, then E_x.available_chunks() should give you the chunks that were actually written.

@ax3l Is this correct? Is that the only way to retrieve the written chunks? I don't see anything further in https://github.com/openPMD/openPMD-standard/pull/252/files.

franzpoeschel commented 1 year ago

Caution: I don't know which version of the openPMD-api you are using (you can find out via openpmd-ls --version). If you use 0.15, then everything is fine, before that, there was a bug that might make E_x.available_chunks() return no results, depending on your workflow.

stevenrbrandt commented 1 year ago

Okay @franzpoeschel and @ax3l , I have put together a small tarball with a bp5 data file, a simple python3 script and a cmd.sh to run the script. It generates a png file. For some reason, openpmd-api thinks that the domain of the problem is 0-40 in x and y, when it's really 10-30 or so. I'm trying to figure out if (a) I'm doing something wrong (b) there's some functionality missing in the python version or (c) the file is generated incorrectly... or something else. https://cct.lsu.edu/~sbrandt/prob.tgz

franzpoeschel commented 1 year ago

@stevenrbrandt I am not able to run this script which you use for data processing.

Traceback (most recent call last):
  File "/home/franz/Downloads/prob/plot-one.py", line 49, in <module>
    vmin = np.min(data[data_index])
NameError: name 'data' is not defined

However I notice that the variable chunks defined as chunks = data_raw.available_chunks() is used nowhere in the script, but instead you still load the entire datasets by data = data_raw.load_chunk(), including undefined regions.

The chunks seem to be properly available in the dataset that you uploaded:

> bpls -D wave2dx.it00000004.bp5/
  double   /data/4/meshes/carpetx_regrid_error_lev00/carpetx_regrid_error              {2, 83, 83}
        step 0: 
          block 0: [0:0,  1:80,  1:80]
  double   /data/4/meshes/carpetx_regrid_error_lev01/carpetx_regrid_error              {3, 163, 163}
        step 0: 
          block 0: [0:1,  33:128,  33:128]
  double   /data/4/meshes/wavetoynrpy_anagf_lev00/wavetoynrpy_anagf                    {2, 83, 83}
        step 0: 
          block 0: [0:1,  1:81,  1:80]
  double   /data/4/meshes/wavetoynrpy_anagf_lev01/wavetoynrpy_anagf                    {3, 163, 163}
        step 0: 
          block 0: [0:2,  33:129,  33:128]
  double   /data/4/meshes/wavetoynrpy_regrid_errorgf_lev00/wavetoynrpy_regrid_errorgf  {2, 83, 83}
        step 0: 
          block 0: [0:1,  1:81,  1:81]
  double   /data/4/meshes/wavetoynrpy_regrid_errorgf_lev01/wavetoynrpy_regrid_errorgf  {3, 163, 163}
        step 0: 
          block 0: [0:2,  33:129,  33:129]
  double   /data/4/meshes/wavetoynrpy_rhs_uugf_lev00/wavetoynrpy_rhs_uugf              {2, 83, 83}
        step 0: 
          block 0: [0:1,  1:81,  1:80]
  double   /data/4/meshes/wavetoynrpy_rhs_uugf_lev01/wavetoynrpy_rhs_uugf              {3, 163, 163}
        step 0: 
          block 0: [0:2,  33:129,  33:128]
  double   /data/4/meshes/wavetoynrpy_rhs_vvgf_lev00/wavetoynrpy_rhs_vvgf              {2, 83, 83}
        step 0: 
          block 0: [0:1,  1:81,  1:80]
  double   /data/4/meshes/wavetoynrpy_rhs_vvgf_lev01/wavetoynrpy_rhs_vvgf              {3, 163, 163}
        step 0: 
          block 0: [0:2,  33:129,  33:128]
  double   /data/4/meshes/wavetoynrpy_uugf_lev00/wavetoynrpy_uugf                      {2, 83, 83}
        step 0: 
          block 0: [0:1,  1:81,  1:80]
  double   /data/4/meshes/wavetoynrpy_uugf_lev01/wavetoynrpy_uugf                      {3, 163, 163}
        step 0: 
          block 0: [0:2,  33:129,  33:128]
  double   /data/4/meshes/wavetoynrpy_vvgf_lev00/wavetoynrpy_vvgf                      {2, 83, 83}
        step 0: 
          block 0: [0:1,  1:81,  1:80]
  double   /data/4/meshes/wavetoynrpy_vvgf_lev01/wavetoynrpy_vvgf                      {3, 163, 163}
        step 0: 
          block 0: [0:2,  33:129,  33:128]

I am able to retrieve this information with the openPMD-api:

>>> import openpmd_api as io
>>> s = io.Series("wave2dx.it00000004.bp5", io.Access.read_only)
>>> component = s.iterations[4].meshes['wavetoynrpy_uugf_lev01']['wavetoynrpy_uugf']
>>> component.available_chunks()
[<openPMD.WrittenChunkInfo of dimensionality 3'>]
>>> first_chunk = component.available_chunks()[0]
>>> first_chunk.offset
[0, 33, 33]
>>> first_chunk.extent
[3, 97, 96]
>>> data = component.load_chunk(first_chunk.offset, first_chunk.extent)
>>> s.flush()
>>> print(data)
[[[ 0.92511896  0.9339786   0.94348059 ... -0.94348059 -0.9339786
   -0.92511896]
  [ 0.93284628  0.94377486  0.95335251 ... -0.95335251 -0.94377486
   -0.93284628]
  [ 0.94010273  0.95221255  0.96128479 ... -0.96128479 -0.95221255
   -0.94010273]
  ...
  [-0.94010273 -0.95221255 -0.96128479 ...  0.96128479  0.95221255
    0.94010273]
  [-0.93284628 -0.94377486 -0.95335251 ...  0.95335251  0.94377486
    0.93284628]
  [-0.92511896 -0.9339786  -0.94348059 ...  0.94348059  0.9339786
    0.92511896]]

 [[ 0.92511896  0.9339786   0.94348059 ... -0.94348059 -0.9339786
   -0.92511896]
  [ 0.93284628  0.94377486  0.95335251 ... -0.95335251 -0.94377486
   -0.93284628]
  [ 0.94010273  0.95221255  0.96128479 ... -0.96128479 -0.95221255
   -0.94010273]
  ...
  [-0.94010273 -0.95221255 -0.96128479 ...  0.96128479  0.95221255
    0.94010273]
  [-0.93284628 -0.94377486 -0.95335251 ...  0.95335251  0.94377486
    0.93284628]
  [-0.92511896 -0.9339786  -0.94348059 ...  0.94348059  0.9339786
    0.92511896]]

 [[ 0.92511896  0.9339786   0.94348059 ... -0.94348059 -0.9339786
   -0.92511896]
  [ 0.93284628  0.94377486  0.95335251 ... -0.95335251 -0.94377486
   -0.93284628]
  [ 0.94010273  0.95221255  0.96128479 ... -0.96128479 -0.95221255
   -0.94010273]
  ...
  [-0.94010273 -0.95221255 -0.96128479 ...  0.96128479  0.95221255
    0.94010273]
  [-0.93284628 -0.94377486 -0.95335251 ...  0.95335251  0.94377486
    0.93284628]
  [-0.92511896 -0.9339786  -0.94348059 ...  0.94348059  0.9339786
    0.92511896]]]
stevenrbrandt commented 1 year ago

@franzpoeschel my code would have run if you'd used the cmd.sh script to do it... as it was, you tripped over an unimportant bug. In any event, your answer helped me figure out what I needed to do, although it took a little experimentation. If you want to see the updated version of the script, it's here: https://gist.githubusercontent.com/stevenrbrandt/ef3718d52680d8a4e6accbad198a0b42/raw/757aebb76ea49df472abaa1aac11f5ec49a5c916/plot-zslice.py