Make allocator more flexible with domain sizes

semi-h commented 7 months ago

The problem is described in #24. I wanted to have a draft PR here so that we can plan ahead how to make this transition in both backends.

In the CUDA backend we always type check and assing a pointer to the %data_d component of field_t. Fortran allows a bounds remapping with pointer assignments and we can use this to have a 3D pointer pointing a 1D data_d array. My plan is to move all the type selection bits into a subroutine in CUDA backend like @pbartholomew08 suggested in a previous PR, and sort the bounds remapping there.

With the OpenMP backend the current way allocator works is very handy, but obviously we need to support different nx, ny, and nz and currently it is not possible. I think currently the %data component in field_t is passed directly into the kernels and it doesn't allow fixing the dimensions of data array when switching form x, y, and z orientations. My suggestion is again using bounds remapping and pointing to the 1D data array with the right dimensions based on the x, y, z orientations we're at. What do you think, @Nanoseb?

Alternative to this bounds remaping strategy is using assumed shape arrays in kernels but I think it might be more complicated especially with different velocity grid dimensions and pressure grid dimensions. I think padding is a good strategy to deal with this and assumed shape arrays would't mix well with with padding I believe.

Nanoseb commented 7 months ago

Yes, I think that can be neat. How I see it implemented is to have all the bound remapping handled by the allocator%get_block function. When ever we request an array we call it with an argument like "XYZ", "YZX" or "ZXY" for example and it will return a 3D array (pointing to the 1D one) that has the right dimensions.

We should be able to call u => allocator%get_block(XYZ) in the highlevel code. I will have to look into the details, but this may imply that field_t will have a 1D array as well as a 3D pointer to it.

Other idea would be to have a setter for the working dimension like call allocator%set_dimensions(XYZ). And all the subsequent get_block() calls will be with that dimension.

pbartholomew08 commented 6 months ago

Another option, which could have some safety benefits for other areas of code would be to encode this in types, something like

type data_block
  real(dp), dimension(:), allocatable :: data
end type

type extends(data_block) :: x_block
end type

The user code would then look like

type(x_block) :: u_x
type(y_block) :: u_y
! ...

! Allocator allocates appropriate sizes based on type
call allocator%get_block(u_x)
call allocator%get_block(u_y)

This would be a wide-ranging change, and require something like get_data(block, ptr) to access the underlying array with the desired shapes. But it would also allow us to do things like reject invalid rotations at compile time

call rotate_xy(u_x, u_z) ! Can't compile due to types

The specific rotation could even be determined by the type if we wanted

call rotate(u_x, u_y) ! Must be an X->Y rotation

semi-h commented 6 months ago

We extend the base allocator in CUDA backend so that we can have a device array member. I don't think we can do extends you recommend and have x directional cuda field and base field under the same class. And then it will be hard to use allocator in solver class.

Maybe we can have a brand new type that has base field as a member, then extend this so that there are one for each direction

type :: dir_field_t
  class(field_t) :: f
end type

type extends(dir_field_t) :: xdir_field_t
end type

But then we'll have a select type for figuring out the direction, then another for resolving down to exact field type and this will make things complicated I think.

JamieJQuinn commented 6 months ago

@pbartholomew08 We can get best of both worlds by storing the orientation (XYZ, YZX, ZXY) in the field, so that could be checked at runtime in e.g. a debug block. No catch by the compiler but seems less complex than playing with the type system.

pbartholomew08 commented 6 months ago

Yes, much less intrusive, I prefer that.

semi-h commented 6 months ago

I think I figured out a nice way to fix this problem. A summary of changes:

We used to have a data and data_d component in field_t directly defined as an allocatable array. Now this is only a pointer, points to a private p_data member, which is a pointer itself.

I would make p_data an allocatable array, but to point and allocatable array it has to be a target, and a member in a derived type can't be a target, so its a pointer instead.

Allocated memory is addessed by this private data member, and then to play with different shapes we use data pointer. This makes it easy to bring this fix into the codebase, because we already used to refer the field arrays with the data member in field_t. Also added a private attribute to the p_data, so that it can't be accessed outside the class. We should always indicate direction when requesting a block (get_block), and then allocator/field jointly responsible from passing us back a shape we want.

We can discuss this in our meeting today, so please have a look if you have time.

The only remaning part is actually setting the padding in the allocator_init, and then allowing users to pass an optional orientation argument to the get_block function in allocator. Bounds remapping will be carried out based on the requested orientation, and the rest of the codebase should work fine with this.

slaizet commented 6 months ago

Discussion to have during our next meeting: we can decide or not to constrain the mesh resolution. Basically to have nx,ny and nz as multiple of SZ. It will definitely not work if SZ is typically 256/512 but could work if SZ is 8/16/32/64/128. Note that we have similar (however more flexible) constraints for the FFTs (nx, ny and nz need to be a combination of power of 2 & prime numbers).

semi-h commented 6 months ago

I think we covered most of the technical parts in the meeting, if there is any point not clear I'll be happy to explain.

I just removed the padding in the z-direction as the current pencil group ordering strategy doesn't require any padding in z.

And it looks like there are a lot of changes but most of it is just get_block passing a direction preference.

A review will be helpful if you have time, please let me know if you want to review but can't do today so we can wait, but otherwise I think its in good shape and can be merged at the end of the day.

xcompact3d / x3d2

Make allocator more flexible with domain sizes #34