Open certik opened 6 months ago
It would be a nice featrure, especially to really make use of UDDTIO and point to various serializers. C# or VB have build in reflection, so this comes naturally. While in these languages you can also use reflection to call methods in compiled libraries that you don't own, the serialization/deserialization is definitely the most useful. For fortran, I gave it some thought recently, i.e. how to mimic reflection (or at least introspection) in fortran. I investigated different solution involving a lot of c_loc, c_f_pointer, storage_size and transfer. I was even ready to somehow compile asr generated with lfortran into derived types and dynamically create dictionaries of component names as key and pointers to components as value. But in the end I faced the problem that components in derived types can be reordered in memory by the compiler. As such, the approach would be restricted to derived types declared with the sequence attribute. Preprocessing the derived types to generate this dictionary would also be possible
type(dict) :: mytype_dict
type mytype
integer:: A, B, C
end type
...
subroutine generate_dict(this)
class(mytype) :: this
mytype_dict%set('A') => this%A
mytype_dict%set('B') => this%B
mytype_dict%set('C') => this%C
end subroutine
So if there is a proposal to do it intrinsically, I would support it.
@davidpfister if the compiler is free to reorder the members of a derived type in memory, then this has to be only allowed for the restricted subset with an attribute, as you said. Thanks for playing with ASR and LFortran. I think reflection can be done in Fortran cleanly, all at compile time (so no runtime overhead). I think it could be very powerful and useful, if we can design it well.
I must admit that I never used namelist before yesterday. I played around and it seems that a lot can already be done with the current capabilities fo the language. Here is what I came up with using a bit of preprocessing to mimic generics:
module point_m
enum, bind(C)
enumerator :: RED
enumerator :: BLUE
enumerator :: GREEN
end enum
type, abstract :: object
end type
type :: coord_t
real :: x = 0.0
real :: y = 0.0
end type
type, extends(object) :: point_t
type(coord_t) :: coord
integer :: color = RED
contains
procedure, pass(this), public :: serialize => serialize_t
procedure, nopass, public :: deserialize => deserialize_t
end type
contains
#define T point_t
#include <serializable.txt>
#undef T
end module
In the include file you get
subroutine serialize_t(this, str)
class(T), intent(in), target :: this
character(:), allocatable, intent(out) :: str
!private
type(T), pointer :: obj => null()
namelist / ser / obj
allocate(character(100) :: str)
obj => this
write(str, nml=ser)
str = trim(str)
nullify(obj)
end subroutine
subroutine deserialize_t(that, str)
type(T), allocatable, intent(out) :: that
character(*), intent(in) :: str
!private
type(T) :: obj
namelist / ser / obj
read(str, nml=ser)
allocate(that, source=obj)
end subroutine
and the main program ends up being
program main
use point_m
type(point_t), allocatable :: point
character(:), allocatable :: stream_data
allocate(point)
point%coord%x = 1.0d0
point%coord%y = 2.0d0
point%color = 1
call point%serialize(stream_data)
write(*,*) stream_data
point%coord%x = 0.0d0
point%coord%y = 0.0d0
point%color = 0
call point%deserialize(point, stream_data)
write(*,*) point
end program
output
&SER OBJ%COORD%X= 1.000000 ,OBJ%COORD%Y= 2.000000 ,OBJ%COLOR=
1/
1.000000 2.000000 1
so formatting to various output format would mean parsing the namelist stream_data (splitting on ',' and '%') and adding <>, {} or whatever format specific characters. One can easily add a procedure argument to the serialize/deserialize functions that would transform/back transform the string content.
From what I see, something pretty neat could be obtained with generics by simply extending the derived type from a generic serializable_t
that would contain the serialize/deserialize functions rather than using preprocessing.
That just gave me some cool ideas for a side project 😄
@davidpfister this seems to implement custom serialization for any user type, but the format on disk is a namelist format. That's one part of the problem. The other part is to have custom binary formats on disk as well.
so formatting to various output format would mean parsing the namelist stream_data
You have no idea what a can of worms it is to do that! :)
Actually I implemented something similar (to some extent) in C# not so long ago to flatten dictionaries and output them to various backend (json, xml, sqlite). The big difference is that the .NET environment comes with a huge toolbox to create tokenizers and lexers. But I agree, doing it in fortran for the namelist format is certainly a hell of a job. I did not even start adding pointers, allocatables and complex inheritance. On top of this, depending on the desired backend some characters need to be escaped (', &, >, < in xml for instance). So, if the namelist format parser would be a can of worms, what should we say about the 'textformater' for the different backends? 😄 and this is a lot easier since @jacobwilliams you did it already (at least for a subset), right? But if I were to start a project on this topic I would certainly create a parser for the namelist (something similar to f90nml in python) and then output the dictionary to different format. Looks like fun!
Well, I got my answer. My approach would not work as soon as you have allocatable or pointer components. @certik, I am afraid that without including the support for allocatables and pointers to namelist (or any kind of read/write, since the same limitation applies to unformatted i/o), that functionality would have a very limited scope. But this is probably material for another proposal.
Originally discussed at
The idea is to use the simple and compiler-enforced syntax of namelist (or equivalent), but the compiler would call a user library that implements other formats, such as TOML, JSON, or custom binary array formats (say npy/npz, GGUF, safetensors, etc.).
To be figured out is the exact design how this would work.
As an example how Rust approaches this problem: the toml library there allows you to just create a struct and decorate it:
Then call it like this:
and it will just work.
A similar feature in Fortran might look like:
Or using the Fortran's namelist like syntax:
And this allows you to implement a user derived type
toml_file
that implements all the necessary capability to read a custom format, and then the lineread(u, t)
makes the compiler call your function/type bound procedures to actually handle the read.