Cutout storage ordering?

lukascodispoti commented 1 year ago

Original post: Hi all.

First of all, thanks to all contributors who make this project possible, I highly appreciate it.

I have a short question: I use getbigCutout() to create 1024*1024*1024*3 cutouts of the velocity filed from the isotropic1024coarse dataset in the form of hfd5 files. I then read the hdf5 files in C, either the full field or by selecting a hyperslab in the hdf5 api. It appears to me that the velocity field is stored in (z,y,x,i) oder rather than (x,y,z,i). That means that when reading the full field, I need to transpose the array before accessing element [i,j,k,l] as [((i*M + j)*M + k)*3 + l]. Selecting a hyperslab starting at (x,y,z) also requires transposition beforehand.

Now, is that expected behavior? If so, maybe it should be documented somewhere or exposed to the user?

I hope what I am saying makes sense.

Cheers, Lukas

Edit: Looking into libJHTDB.py, it does seem to be the case that the result is stored as (z,y,x,dim). While I would prefer the ordering to be (x,y,z,dim)(C-style) or (dim,z,y,x) (Fortran), I assume there was reason to do it like this. Adding dimension scales to the hdf5 dataset could clarify any possible confusion of the user.

rmldj commented 1 year ago

Indeed I was also puzzled by this. I have a further question to confirm what is the ordering of the velocity fields.

If the numpy array returned by getbigCutout is indexed as (z, y, x, i) (please confirm), does i=0 correspond to x-component of the velocity, i=1 to y-component and i=2 to the z-component? Or is there also some reshuffling for the components of velocity?

chichilalescu commented 1 year ago

Hi all,

The results of "getbigcutout" are 4D arrays: fastest index: field component, C indexing 0 through 2, with (0,1,2) = (x, y, z) 2nd fastest index: x direction 3rd fastest index: y direction 4th fastest (slowest) index: z direction.

If you are accesing the data from fortran, the indexing will be (i, x, y, z). If you are accessing the data from Python, the (numpy) indexing will be [z, y, x, i] If you are accessing the data from C, I would recommend casting to a 1D array, and using the index (i + 3(x + nx(y + ny*z))) (careful to have all variables as size_t or ptrdiff_t).

Finally: I would sincerely recommend that you do not systematically copy the database by using "get cutout" functionality. At a minimum, you should contact the group directly @.***) and tell them what you need (I myself haven't been part of the group for a while).

Best, Chichi Lalescu

On Tue, Apr 25, 2023 at 9:47 AM rmldj @.***> wrote:

Indeed I was also puzzled by this. I have a further question to confirm what is the ordering of the velocity fields.

If the numpy array returned by getbigCutout is indexed as (z, y, x, i) (please confirm), does i=0 correspond to x-component of the velocity, i=1 to y-component and i=2 to the z-component? Or is there also some reshuffling for the components of velocity?

— Reply to this email directly, view it on GitHub https://github.com/idies/pyJHTDB/issues/43#issuecomment-1521316584, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB6KA746MSI7PAPC4FMP5ULXC56SPANCNFSM6AAAAAAWUH3LIA . You are receiving this because you are subscribed to this thread.Message ID: @.***>

rmldj commented 1 year ago

@chichilalescu Thanks for the information!

I just need a number of appropriate planar velocity slices (512x512). In total 1920 of these. So this would be less than a single timestep of the isotropic1024coarse dataset. I hope that this is not too much using the API?

chichilalescu commented 1 year ago

I think that should work. But maybe one of the current members of the group will jump in to comment.

Best, Chichi Lalescu

On Tue, Apr 25, 2023 at 1:23 PM rmldj @.***> wrote:

@chichilalescu https://github.com/chichilalescu Thanks for the information!

I just need a number of appropriate planar velocity slices (512x512). In total 1920 of these. So this would be less than a single timestep of the isotropic1024coarse dataset. I hope that this is not too much using the API?

— Reply to this email directly, view it on GitHub https://github.com/idies/pyJHTDB/issues/43#issuecomment-1521623856, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB6KA76SB2F4A7RHIZO34MTXC6X4DANCNFSM6AAAAAAWUH3LIA . You are receiving this because you were mentioned.Message ID: @.***>

idies / pyJHTDB

Cutout storage ordering? #43