Closed pbartholomew08 closed 2 months ago
I generally like the idea of keeping all the field-relevant info inside field, absolutely makes sense. I think the only criticism I have is the number of field types. Just to clarify, the type of a field (e.g. VERT, CELL, etc) is intended to:
Right?
Do we necessarily need X_EDGE, Z_FACE, etc, for these purposes? I'm wary of over-complicating this. Maybe it's necessary for the allocator to allocate the correct size of underlying array, but if we're talking about allocating slightly different sized arrays, should we be thinking about not using the pool allocator?
I do think NULL is useful for situations where you just need an array and you don't care about its grid representation.
On an implementation level, I'm wary of tying the tdsops class to the allocator, especially in circumstances where the desired array won't interact with the tridiagonal system. But maybe we just have two different allocate calls, one for a "tridiagonal array" and another for just plain ol data.
Generally in favour though, we can always start with this, see how it works and amend as needed.
Two minor points before the main issue regarding n
.
%data
variable. For example n_groups
can be obtained by a getter function where it will returnsize(field%data, dim=3)
, and this should have no consequences at all because n_groups
is always same regardless of where the data lives (vertices, cell centre etc). Similarly, n_padded
can also be obtained with size(field%data, dim=2)
.dirps%n
always stores the number of vertices in a direction, and never the number of cells, and used based on this fact in the codebase. (So the name is actually misleading, and also dirps
is high level enough to have explicitly named variables for both, such as n_vertex
and n_cell
.)My concerns about n
in the field is based on its use in a very specific location in the codebase (with #85) in copy_into_buffers
in the OpenMP backend. This is one of the very few locations in the codebase where we need to know about the actual number of data points in a field.
https://github.com/xcompact3d/x3d2/pull/85/files#r1559233850
If we want to use field%n
here then its very critical to have the correct value, and from the proposal it looks like this requires quite a bit of work, and added complexity. (with the suggested types I want to say it is possible, but also want to look into some of the high level solver subroutines to make sure)
My understanding is that there are very few locations in the codebase where we need to obtain the correct number of meaningful data points along a dir
from a field we happen to have. Because there are mainly two things we do with a field, we either call a tds_solve
on it or reorder
it. Currently, tds_solve
solves this problem by using the operator tdsops_t
, and in the reorder
it is safe to loop over the padded dimensions instead of actual dimensions.
Just so that we have a better picture, and justify the added complexity, would you be able to point out a few places where field%n
will be useful in the codebase please? Because in #85 it is used only in two locations, one is copy_into_buffers
and the other is the reorder
(OpenMP currently but probably will be extended to CUDA too), and in both cases the current implementation is quite robust and doesn't require a field%n
.
And if we realise that the use cases of field%n
are not in such critical locations, the alternative will be allowing a detailed description in some fields (VERT/CELL/X_FACE ...), and also having a NULL
state where n
could be undefined (such as with intermediate fields when we're not certain about where data lives). Also, maybe having a dims(3)
in a field where we store the actual number of data points can be better idea (only when the type is not NULL
). This will give us access to actual grid size only when a detailed description is provided in the field
(one of VERT/CELL etc.).
1. Give the allocator enough information to allocate the right size of array, and 2. Provide some way to error check that the field passed to a subroutine is in the format the subroutine expects
Right?
Yes, it stores the type so it can be checked and also sets the different sizes so that they align with the data we are storing in this array. There is no allocation
in a fortran sense here though as we are always allocating arrays of the same (padded) size.
Do we necessarily need X_EDGE, Z_FACE, etc, for these purposes? Yes, we need those for cases where the boundary conditions aren't the same in x/y/z directions, the fields will have different sizes depending on those.
I'm wary of over-complicating this. Maybe it's necessary for the allocator to allocate the correct size of underlying array, but if we're talking about allocating slightly different sized arrays, should we be thinking about not using the pool allocator?
The arrays are always allocated to the same size, only their dimensions (n
, n_groups
etc.) changes. So it does make sense to still keep the pool allocator.
I do think NULL is useful for situations where you just need an array and you don't care about its grid representation.
On an implementation level, I'm wary of tying the tdsops class to the allocator, especially in circumstances where the desired array won't interact with the tridiagonal system. But maybe we just have two different allocate calls, one for a "tridiagonal array" and another for just plain ol data.
I understand that, I think it will only be the initialisation that will get some information from tdsops. But we can have ways to 'bypass it' and explicitely put other values if needed
@semi-h in the iterative solver we need to setup vectors for the solver (at the moment) which requires knowing the (local) size of the fields, currently this information seems to be divided between dirps_t
(n), common.f90
(SZ) and allocator_t
(padded dimensions) and field_t
(ngroups/size(data, 3)
). It is unclear (to me) where to get the data from*. And then when solving, if we pass in an RHS and solution field only then we need to get the dimensions of the local sizes to copy in/out of the PETSc vectors.
If we don't want to add this information to the fields, can I suggest a mesh
(for want of a better name) object that gathers this all in one place? It could be queried for a given grid location to get the relevant sizes.
I think a mesh
object makes sense and is the best solution in your example. FFT based Poisson solver also requires this and I can see its use in many other places. With the Poisson solver we always know that the solver works on the pressure grid, so passing this info explicitly when instantiating the relevant classes is completely fine in my opinion.
OK - @Nanoseb do you think that will also address the issues you have? I see this working like
xcompact3d:
mesh = initialise_mesh(nx, ny, nz, ... ) ! Get the local decomposition/sizes/etc
allocator = initialise_allocator(mesh, other args)
xdirps = initialise_dirps(mesh, other args)
poisson = initialise_poisson(mesh, other args)
What do others think?
The mesh
or grid
should be fairly lightweight (few integers really + methods to query sizes) so I think there's no harm in copying it as needed.
I like the idea of having a mesh object, it is definitely an improvement compared to the current situation. But it does not address everything. Mainly, you will need to know if you are working on a x,y or z field and what kind of field when needing n
or n_groups
.
What about we keep the possibility of storing the type
and direction
of each field, but we don't store the sizes or anything related to that there, instead we can just query the mesh object for those:
n = mesh%get_n(direction, type)
and we would call it as mesh%get_n(DIR_X, VERT)
or mesh%get_n(u%dir, u%type)
or mesh%get_n(u)
.
This implies that we still record the type when requesting get_block
.
If we are concerned by the added overhead on the code to record the type
for every field. We can also not record it but have it as an argument when querying only:
n = mesh%get_n(u, VERT)
or n = mesh%get_n(u%dir, VERT)
.
I think it is a very good idea to have the proposed type
s in the codebase such that the mesh
object can understand them and return the correct sizes for us, let it be get_n(dir, type)
in your example or a more generic get_sizes(type)
like Paul needs. Its definitely a lacking feature in the codebase so it'll be great to have it. One other place where I anticipate its use is in the IO, where we would require an input from the user in terms of where the data lives so that only the correct bits of the field is written in the disk.
I also think it is a good idea to be able to store the type
in a field when we want to, especially if its optional. What do you think about making this optional when calling a get_block
, and setting type
member variable to NULL
when it is not provided? If this would work where you want to use the mesh%get_n
function then I think its the best solution. This will allow us to not specify where data lives and have a bit of freedom when we're working with temporary fields in a high level operation, and allow us to figure out the actual size of a field if the type
is not NULL
, also will allow us to do some sanity checks at least at high level functions.
One suggestion for the terminology: instead of calling where the data livestype
how about calling it dloc
/ d_loc
/ dataloc
/ data_loc
/ dpos
or similar?
Great, it is nice to have a consensus.
For the naming convention, I think I prefer dataloc
as it is the most explicit and understandable one to me.
What
Currently the code stores information about the grid dimensions in various locations, e.g. SZ in
common.f90
, local length of something is stored inxdirps%n
, number of blocksxdirps%nblocks
. Padded versions of these are stored inallocator%xdims_padded(3)
. A field stores its direction/orientation infield_t%dir
but no size information. Accessing this all together at the point of use is difficult and requires bringing multiple objects together.Proposed solution
Fields should store their local and padded size, and their grid type (vertex, cell, face, edge centred) which ensures that the sizes are correct for their use.
The
get_block
subroutine will also take the field "type" in addition to the direction. This will allow it to set the local and padded dimensions according to the field's grid type and orientation. Suggested types are:VERT
(velocity grid)CELL
(pressure grid)X_EDGE
x-oriented edge (interpolation / derivative to/from velocity/pressure along x)Z_FACE
z-normal oriented face (interpolation / derivative to/from velocity/pressure along y)Y_EDGE
,Z_EDGE
- reserved but unusedX_FACE
,Y_FACE
- reserved but unusedNULL
- no specified grid location - is this necessary???A field's size will depend on where in the tridiagonal system its block is, boundary blocks will have different sizes from internal blocks, however after creation the field doesn't need to know its location, only its size. The allocator will take
tdsops_t
(X/Y/Z) as arguments on initialisation, this will allow it to determine where in the grid it is, and use this information at field creation time.Summary
private
, preventing direct access to them and addressing concerns about them becoming invalid. In theNULL
field case only the setter will allow changing these, otherwise it is an invalid call and causes anerror stop
.