Open DavidSagan opened 5 years ago
Hi David,
in general, openPMD is totally open to store more information in form of attributes or even other groups in data files. You can always add program-specific attributes or even non-openPMD data #115, if you consider them not suited for standardization, only applicable to a specific application (and documented therein), etc.
I am not entirely sure if such information should just be part of the specific application's documentation. For example, in PIConGPU we add additional information on specific openPMD files that are usable as "checkpoints" which we document in our main docs. In our case, such additional information is for example the current state of a random number generator in a distributed simulation or the current state of "next unique particle ids" for newly created particles from ionization or QED physics (also distributed).
I am also totally open to document this in openPMD-standard as well, in such a case one would write an extension EXT:App-PIConGPU
or EXT:App-Bmad
.
I tend to think one is potentially more flexible keeping such an extension in the applications docs (and just "reserving" the naming here). The simple reason behind this is that one might change application-specific data more often than the openPMD-standard and it might be non-trivial to define a general enough "kernel" or even common structure of such highly application-specific info.
I suggest as well to store them in your openPMD series directly (#115), since the notion of actual "files" is not needed for openPMD and we are actively working towards streaming workflows with openPMD series for Exascale. Storing it with the openPMD series will just translate such additional info very naturally, without extra "file" handling (although the series itself might still be in files, which is an implementation detail).
@ax3l
in general, openPMD is totally open to store more information in form of attributes or even other groups in data files. You can always add program-specific attributes or even non-openPMD data #115, if you consider them not suited for standardization, only applicable to a specific application (and documented therein), etc.
Yes understood.
I am not entirely sure if such information should just be part of the specific application's documentation. For example, in PIConGPU we add additional information on specific openPMD files that are usable as "checkpoints" which we document in our main docs. In our case, such additional information is for example the current state of a random number generator in a distributed simulation or the current state of "next unique particle ids" for newly created particles from ionization or QED physics (also distributed).
This information definitely should be part of the specific program's documentation. The proposal here would just also collect the information in the openPMD Git repository. This is advantageous for a number of reasons. By putting all this information in one place it makes getting at this information simpler. Also, for example, I did not know that PIConGPU was using openPMD so if I was searching for what added information other programs were putting in I would not know to try to look at the documentation for PIConGPU.
I am also totally open to document this in openPMD-standard as well, in such a case one would write an extension EXT:App-PIConGPU or EXT:App-Bmad.
Since the extra information put into a data file by a program will be specific to that program. My druthers would be to clearly mark this stuff as not part of the openPMD standard. But as long as it is clear what the situation is I could go either way on this.
I tend to think one is potentially more flexible keeping such an extension in the applications docs (and just "reserving" the naming here). The simple reason behind this is that one might change application-specific data more often than the openPMD-standard and it might be non-trivial to define a general enough "kernel" or even common structure of such highly application-specific info.
I'm not sure what you mean by "application docs" here. Is there currently an applications docs area? Certainly we do not want to have to change the revision number for the standard if a program's maintainer changes what extra information the program is storing.
I suggest as well to store them in your openPMD series directly (#115), since the notion of actual "files" is not needed for openPMD and we are actively working towards streaming workflows with openPMD series for Exascale. Storing it with the openPMD series will just translate such additional info very naturally, without extra "file" handling (although the series itself might still be in files, which is an implementation detail).
The proposal here is to get the information as to what programs are adding to data files outside of the standard all in one place so that a person who is interested in finding out in general what is being added can easily access this information. Having to look in data files themselves for this information looks worse to me than trying to dig up the documentation from all the programs that use openPMD.
Certainly for someone who is only interested in the extra information added by one particular program, this proposal will not be of much use. This proposal was made for the times when it is desired to know in general what extra information programs in general are adding.
@ax3l Consider these use cases:
A) When a person who maintains program X interfaces reader code to read in openPMD files, if he/she sees that program Y stores some extra information that could be used in program X, then he/she can extend the the reader interface to read in the information if it is available.
B) When further developing the openPMD standard, knowing what extra information different programs are putting in data files can help guide the development.
@ax3l
I am not entirely sure if such information should just be part of the specific application's documentation.
Another comment on this: My plan is to put information in the Bmad manual and then, If approved, I would simply have the Bmad information file stored in the openPMD repository point to where the information is located in the Bmad manual. I definitely do not want to maintain the same documentation in two places.
I am now in the process of interfacing Bmad with code to read/write openPMD files and I am contemplating the issue that I may be putting "extra" information into the files that are created that are not part of the openPMD standard. For example, information about the lattice.
Since this information may potentially be useful to other program maintainers, I propose letting program maintainers submit information about what nonstandard information is stored in files that their program creates. This information would be stored in separate files in the openPMD Git repository making for easy access to this information. Things need to be arranged so that it is clear that this information is not part of the standard. For example, a "non-standard" directory could be created and a "Bmad.info" file in that directory could hold information about the non-standard information that Bmad includes in data files. As an added benefit, such files could help with future development of the openPMD standard since the files will show what individual programs need in their data files.