IO modules - Githubissues

LonelyCat124 commented 4 years ago

Create IO modules as required for various IO systems used for particle methods.

LonelyCat124 commented 4 years ago

Some thoughts on more general purpose IO systems (first for HDF5).

Given a simple (i.e. nonnested) field_space, we can quite easily retrieve types and names of fields:

fspace part{
  mass: double,
  rho: double,
  cutoff: float,
  an_int_64: int64,
  an_int_32: int32,
}

function get_elements( field_space )
  for key, index in pairs(field_space.fields) do
      local name = index.field.symbol_name
      local types = index.type.cachedcstring
      print(name, types)
  end
end

get_elements(part)

This outputs:

mass    double
rho     double
cutoff  float
an_int_64       int64_t
an_int_32       int32_t

One we add a nested field space in, we get:

mass    double
rho     double
cutoff  float
an_int_64       int64_t
an_int_32       int32_t
nested_space    false

One easy thing we can check (for nested space's) is that the index.type.cachedcstring for the nested_space is of type boolean, while the others have type string.

I've had a few ideas from here, and unsurprisingly user simplicity and programming simplicity are inversely proportional :). I'm separating those into the next response

LonelyCat124 commented 4 years ago

Idea 1: This is the best from the user point of view, but probably the least realistic. The idea here would be to read the particle structure defined by the user (ignoring the core_part_space and neighbour_part_space, and then use metaprogramming to read the fields from the file (with the Positions/Velocities etc. automatically being pulled from the file into the core_part_space).

There are so many things that makes this difficult that I think its realistically impractical, plus it specifically defines reserved names in file formats for the DSL, which since we want it to be easily accessible I'd rather not do.

Idea 2: As well as specifying a part field space, the user is also required to specify a io field space (if using an IO module), that cannot contain nested field spaces e.g.:

fspace io{
  Position_x : double,
  Position_y : double, 
  Position_z : double,
  ....
}

On top of this, the user needs to specify some mapping between the io fpsace and the part fspace. I'm not totally sure how would be best for this, but one possible solution would be just a lua table (we could also use a terra list):

io_mapper = {}
--The mapper maps between the io field space to the particle field space
io_mapper["Position_x"] = "core_part_space.pos_x"
io_mapper["Position_y"] = "core_part_space.pos_y"
io_mapper["Position_z"] = "core_part_space.pos_z"

These would then be given to RegentParticleDSL which would then read in (or write out) the files accordingly (using the type definitions defined in the io field space and the mapping between the io field space and the main particle type).

This is not too much more effort for the user (though the mapper is a little annoying), but gives much more freedom. It still doesn't allow access into more complex features of HDF5 (groups, attributes, etc.) right now, but for simple files is nice.

One issue for this right now is that nothing is read into the config structure if things are required from attributes etc. in the HDF5 file, so I'd need to work out a good way to add that, along with working out the particle array sizes.

Idea 3: The final idea is much more complicated for the user, but could allow much more in-depth access to HDF5's features.

This would require the user creating a variety of terra lists, detailing how their HDF5 file is structured, including things like Groups, Attributes etc. etc., at a guess this would look maybe something like:

io_read_attr = terralib.newlist({
  {name="ParticleCount", group="Header/", type=int64},
  {name="Dimensionality", group=nil, type=int32},
...
})

io_read_arrays = terralib.newlist)({
  {name="Position_x", group="Particles/", type=double, part_structure_path="core_part_space.pos_x"},
  {name="Position_y", group="Particles/", type=double, part_structure_path="core_part_space.pos_y"},
  {name="Position_z", group="Particles/", type=double, part_structure_path="core_part_space.pos_z"},
...
})

These would be fed into the IO module, and allow much more complex IO, but at the cost of user-complexity during setup.

I think realistically the second idea is preferable (while being realistic) for average users, while the third might also be necessary for expert users who have specific file formats they need to adhere to for 3rd party tools.

Of course there are almost certainly many features of HDF5 still not covered by these options that I'll discover when I give it to users. @rupertford do these options seem reasonable from a user-complexity point of view? Or should the "automagical" method to work out the HDF5 file structure from the particle (and config) definitions be the goal, even if it makes certain requirements of the input HDF5.

rupertford commented 4 years ago

I presume that most users will have input files from an existing application. If there are not many codes (and therefore formats) out there then would it be possible to support well-used formats via pre-cooked libraries that mean that the user does not need to worry (much) about the above options? Would this allow idea 1?

Where the above is not possible you obviously need to provide a way for the user to specify the mapping. Perhaps this could be done outside of Regent in some simple metadata format and the appropriate code could could be generated by e.g. templating? Would this potentially allow idea 1?

If neither of the above are possible then I agree that option 1 sounds too difficult (as there is no metadata describing the mapping), so hand written mapping code (idea 2) or capturing the file structure explicitly (idea 3) would need to be implemented. I would go for idea 2 to start with and see how far you get.

LonelyCat124 commented 4 years ago

I think off the top of my head I know at least 4 different file formats used (HDF5, bi4, ndx and others, plaintext) for only 4 different particle codes (swift, dualsphysics, gromacs, DL_POLY). I think using the file format's appropriate libraries would be nice, but there are some issues with both licensing/requirements and perhaps interoperability (if the library use MPI or anything it could be problematic).

My plan would be to have modules for "common" IO formats (made as/when requested for now), written in C++/C/Terra/Regent/(Python? Not yet though) as appropriate. Some of these might be straightforward, e.g. for MD you might always only have position and element, so no extra stuff would be needed. However for SPH (and maybe other methods) the variables present in particle types is dependent on the exact version of SPH you're solving (e.g. you may have entropy terms, viscosity terms, etc.etc.), so I was mostly thinking about flexibility for users with extra requirements. I think this is more a thing for binary file formats as opposed to plaintext (which many/most MD formats I believe are).

For now I'll implement Xioahu's ISPH file IO format (which I believe is plaintext) and an implementation for output of idea2 above (as I think Xioahu mentioned wanting HDF5 output at some point)

LonelyCat124 commented 4 years ago

The ISPH_module branch has some code to start reading in those files (and writing out C formatted versions for now). In I principle could use Fortran code to do the output, but I'm not super keen on doing that right now (due to then requiring compilation, fortran libraries etc.). I tested the ability to read the inputs he sent and that was fine.

I'm waiting on Xiaohu to explain what the values actually are per-particle so I can make sure they are placed correctly.

I'll look at HDF5 when I'm next not working on bug fixes (probably tomorrow but we'll see)

LonelyCat124 commented 4 years ago

I had a try at generating the field space from the mapper automagically, but I'm having some difficulties with that, so will just go back to user-defined field space for now until I've worked that out.

LonelyCat124 commented 4 years ago

Ok, so I tried using the inbuilt hdf5 functionality in Regent, and its gonna be pretty messy, but I think I could get it to work with some effort. I'm going to write a hardcoded version for some minimal thing to work out what I need then try to metaprogram it properly.

LonelyCat124 commented 4 years ago

Ok, I tested with a simple mini-code and metaprogramming and the theory is there, but I'm not quite at the point it will work yet.

Currently I'm testing it with the ISPH module (which provides its own field space for hdf5_io_space and its own mapper definition), and hitting a barrier using the elements from core_part_space, where the compiler is claiming the fields don't exist. I can probably work around this in an ugly way for now, but I'll probably make an issue upstream as well.

LonelyCat124 commented 4 years ago

Im guessing the issue from subfields has the same cause as for priveleges, covered by issue (https://github.com/StanfordLegion/legion/issues/932).

For now there is a workaround in place which works for things in core_part_space, but if one constructed many sub field spaces these would not work correctly with the HDF5 io module at current.

LonelyCat124 commented 4 years ago

Ok, so the good thing now is this is much easier for the user for simple HDF5 layouts!

The user defines a mapping between file and the part field space, e.g. for Xiaohu's ISPH this could be:

isph_module.hdf5_mapper = {}
isph_module.hdf5_mapper["Position_x"] = "core_part_space.pos_x"
isph_module.hdf5_mapper["Position_y"] = "core_part_space.pos_y"
isph_module.hdf5_mapper["Velocity_x"] = "core_part_space.vel_x"
isph_module.hdf5_mapper["Velocity_y"] = "core_part_space.vel_y"

To write the hdf5 file, this is then as simple as:

[hdf5_module.write_output("/home/aidan/isph_read/test.hdf5", isph_module.hdf5_mapper, variables.particle_array)];

For other IO modules, I'll probably create mappers that let you automagically switch between file formats (so the user doesn't have to do things) as appropriate, so in principle RegentParticleDSL can be a simple way to switch between file formats.

I need to now read in a HDF5 file, which I've not quite worked out how I want to do yet as I need to retrieve the number of particles somehow.

stfc / RegentParticleDSL

IO modules #26