Open DavidSagan opened 7 years ago
This seems reasonable to me ; I am not sure anymore why we decided to have these paths customizable... @ax3l: Any idea?
@DavidSagan Note that, for this discussion and similar ones on the openPMD standard, it is probably more appropriate to use the Issues of the openPMD-standard repository. Would that be fine with you?
@RemiLehe: I'm a little confused here. Are you proposing moving all of the issues in this repository over to openPMD? The original idea was to simplify life so that people who were only interested in developing the version 2 standard here would not have to wade through the other issues in the openPMD issues forum and vice versa.
Yes, thanks for reminding me this. I now remember that we agreed to opening issues on the present repository.
However, it seems that there are two "version 2" of the standard that are being developed: one on the wiki of the present repository, and one on the official openPMD repository: https://github.com/openPMD/openPMD-standard/milestone/4. Thus it would probably be good to centralize this, at some point.
That being said, since it is explicitly said in the wiki that it is a first pass as a basis for discussions, I guess that the aim of the v2 in the wiki is to establish a "wishlist" for openPMD in order for it to be used for particle accelerator codes (is that a correct statement?) In that case, I agree you should wait for this first pass to be complete, before merging/reconciling this with the v2 of the official openPMD repository.
@RemiLehe: In fact I did not realize that there was a V2 being developed at the openPMD repository until you just mentioned it.
The aim of the h5particle wiki here is that I thought it would facilitate the discussion if I presented a concrete draft. I'm hoping to have something within a week. Nothing is written in stone. Merging/reconciling can actually begin now if you want. Just be aware that at this point I am in the process of making substantial changes so what you see could look a bit jumbled.
Please be aware that for the purposes of editing, I am keeping everything in one document. My thought was that at the end of finalizing the V2 standard, we can split off anything into an extension as needed.
Another thing is timing. I noticed that there were open issues on the openPMD site that were over 2 years old. I am hoping that the process here can be completed on a time scale of months...
In fact I did not realize that there was a V2 being developed at the openPMD repository until you just mentioned it.
Indeed there is, as well as a strategy to update existing files, bindings and tooling. We are far from in-active! Recently we only focused on projects and tooling and had no strong needs to advance more general aspects of the standard.
wiki is to establish a "wishlist" for openPMD in order for it to be used for particle accelerator codes (is that a correct statement?)
That sounds like a good plan to me. It would be great if the needed features could then be opened in the mainline repo. If you want to move fast, domain-specific and break things that is totally fine as well, but please avoid naming and versioning it the same as the mainline project as long as they are diverged :)
Another thing is timing. I noticed that there were open issues on the openPMD site that were over 2 years old. I am hoping that the process here can be completed on a time scale of months...
This seems to be a misunderstanding. "Issues" in the openPMD standard are not "defects" that nobody cares about but ideas where to generalize and move next to. The direction is need and community driven: We always were and are well aware we can not cover all generalized descriptions of particle and mesh formats at once and decided to progress step by step. If you see open issues or missing issues in the standard just jump into the discussion and tell us it is relevant to you (and maybe even how you would address it). I must admit I learned of your repo & plans these days since I saw now issue or report of you on the mainline :)
@ax3l: I don't want to fork the openPMD specification. Ideally my thought was that if we all work together we could come up with an openPMD Version 2 standard that meet the needs of accelerator physicists and that would be official. So please take a look at the draft for the V2 standard on the marcguetg/h5particle wiki and tell me what you think.
The wiki work looks great! 👍
Can we try together to summarize the changes a bit?
A first view looks to me like we could do the following:
Accelerator
openPMD extension:
geometry
instead of only defining it into an extension, but inside the extension it is fine as well with the additional attributes you added!/
: look good so far! we also have some more in the pipe and we would need to understand your concept of time/snapshotsunitSymbol
/unitsOfMeasurement
: we could add labels for how to label things, but this can also go in an extension for now; the idea so far was to by symbol agnostic and just parse unitDimension
- a domain-specific viz can then just map such (and if necessary also take into account the name of a record) to a symbol with easeCan you please elaborate a bit on the concept behind the particle map? :)
@ax3l
let's have the domain-specific naming
I'm not sure what you mean by this. Could you suggest an appropriate name instead of Accelerator?
support for lattice coordinates: one could try to map this into a geometry instead of only defining it into an extension
Yes this was my thought but other people thought otherwise so I put the lattice specific stuff in the Extension. At this point this is not an issue for me so I will agree either way.
we also have some more in the pipe and we would need to understand your concept of time/snapshots
I'm not sure what needs to be clarified here. Can you give me specifics as to what needs to be clarified?
unitSymbol/unitsOfMeasurement
The idea here was that this does not affect any calculations or conversions. It simply is a way to 1) make things clearer to a human viewing the file and 2) Providing a string that a program displaying numbers from the file can also use in the display. Again as an aid for human understanding. The thing is that if the data is originally measured in, say, "miles/hour" there is no way for a program to reconstruct that.
let's have the domain-specific naming I'm not sure what you mean by this. Could you suggest an appropriate name instead of Accelerator?
Your current wiki draft proposes to define names such as pion
, antimuon
, etc. in the base standard. This is not necessary for most applications, we can put it in the Accelerator extension.
Yes this was my thought but other people thought otherwise so I put the lattice specific stuff in the Extension. At this point this is not an issue for me so I will agree either way.
That's good, we can also put it the the extension first (2.0) and later generalize it into the base standard.
I'm not sure what needs to be clarified here. Can you give me specifics as to what needs to be clarified?
How do you use snapshots? is this a checkpoint? E.g. what is the difference between a snapshot and an openPMD iteration?
unitSymbol/unitsOfMeasurement
yes, I understand and like that. Would a general optional "label" as we do it on a few places already do? E.g. a label for the symbol of each record and a label for the unit of each record component?
As a note, we need to be aware this also introduces a potential to write something incoherent there... (e.g. mismatch between unitDimension
and latex of the units of measurement)
@ax3l
Your current wiki draft proposes to define names such as pion, antimuon, etc. in the base standard. This is not necessary for most applications, we can put it in the Accelerator extension.
The species names should be put in the base since the base dictates the use of species names and so, like everything else should be standardized. Just like you standardize cartesian coordinate names to be "(x, y, z)". And many accelerator physics applications also need to know the particle species. One thing that could be done if you don't want to have the names spelled out in the base is to put the naming convention in a separate file and then a reference to the file can be put in the base standard.
How do you use snapshots? is this a checkpoint? E.g. what is the difference between a snapshot and an openPMD iteration?
I don't know what a checkpoint is but a snapshot is essentially the same as an iteration. I am proposing the name change since instead of a series of time steps one could store a series of bunches in a beam so the word "iteration" is misleading.
And can you explain what checkpointing is here and how it interacts with the standard? -- Thanks
Would a general optional "label" as we do it on a few places already do?
"label" would work but "unitsOfMeasurement" I thought was a better descriptor. If you think this is too confusing with "unitDimension" how about something like "unitsDescriptorString"?
@ax3l
Can you please elaborate a bit on the concept behind the particle map? :)
Sorry I missed this question first time around. I'm confused by what you mean by "particle map". The word "map" is not used in the draft I have been working on.
particle map
Ah ok, I was just reading the initial thought
section and saw it there.
The species names should be put in the base since the base dictates the use of species names and so, like everything else should be standardized.
I think there is a conceptual misunderstanding what base and extension in openPMD is. Just because something is in an extension does not mean it is not standardized or in a domain very important.
The openPMD base standard is (scientific) domain agnostic. It describes data and gives enough information to exchange and (dumbly) visualize data without domain knowledge. Extension add domain-specific needs, such as namings of records for the domain scientist, methods used in a simulation, interpretations of values in a certain domain's scope, additional attributes, required records, etc.
Extension can in principle also be combined, e.g. one could do a plasma extension and a hydro and a PIC extension and use Plasma + ED-PIC or Plasma + Hydro together with the base standard.
In addition, both the base standard and extensions are not disallowing to add even further records, folders, attributes, etc. if e.g. a specific application needs it but that aspect is not worth standardizing. Examples for that are for example a GPU's random-number matrix generator state or a unique particle-id generator state, etc. that are needed for restarts of a simulation but are not relevant for data exchange or data processing.
I don't know what a checkpoint is but a snapshot is essentially the same as an iteration. I am proposing the name change since instead of a series of time steps one could store a series of bunches in a beam so the word "iteration" is misleading.
I see your point and initially we intentionally went for "iteration" to avoid confusion with "time steps" which are not necessarily lab time. I migrated this proposal to https://github.com/openPMD/openPMD-standard/issues/148 for discussion.
And can you explain what checkpointing is here and how it interacts with the standard? -- Thanks
Checkpointing (of a simulation) would be dumping and re-storing of its full internal state. In detail this is a very domain- and even application-specific task and not defined how it needs to be done in the openPMD standard. For particle-in-cell codes, we are thinking about unifying it to some extend in future versions of the ED-PIC
extension. This is relevant for tightly coupled simulations.
@ax3l
The openPMD base standard is (scientific) domain agnostic. It describes data and gives enough information to exchange and (dumbly) visualize data
Exactly! That is why, for example, the standard mandates that the names "x", "y", and "z" be used for Cartesian field components. Since the species name is part of the data, it needs to be standardize too. For example, if someone creates a data file using "e+" for the name of electrons, this will not be portable. If the species name where not mandated in the base standard I would agree with you but it is mandated in the base. And ensuring portability is much more important than being agnostic.
If the species name where not mandated in the base standard I would agree with you but it is mandated in the base.
I am a bit confused and can not fully follow why a file that has "base + Accelerator" openPMD markup would not be fully self-describing & portable. A data reader would check that the extension "Accelerator" is set as a requirement and with that "electrons" as a naming are suddenly known.
@ax3l
Yes the "Accelerator" extension would be portable but the base standard alone is not portable without a species naming convention. I suspect that the only reason why you have not run into trouble before now is that specifying the mass and charge has been good enough. But this is not good enough for a wide range of problems.
My belief is that putting in a species naming convention makes this standard more useful and there is really no drawback in putting it in. Why sacrifice usefulness over ideology?
Since for me personally this will not be an issue (I will always be using the "Accelerator" extension), and you really want agnosticism over portability, I will not argue the point further.
@ax3l @DavidSagan How about then having a "Physics" extension - or something generic like that -which would contain the definitions for particles and other data that will be useful across many domains?
@ax3l @jlvay
How about then having a "Physics" extension - or something generic like that -which would contain the definitions for particles and other data that will be useful across many domains?
Rereading my draft V2 standard + Extension I realize that nowhere is there a slot to specify the species! This will have to be fixed. The question is this: If the base standard has a species attribute (or mandates that the species is encoded in the directory path like it is in the present version 1), but does not specify what the species names should be, then files created that only use the base standard, but not any extensions, are not portable. And for me portability is of paramount importance.
portability is of paramount importance.
For us as well. I think there is a misunderstanding about the modularity concept in openPMD base standard + extensions that we might be able to better clarify in a VC. Both the base standard and the extensions are part of the openPMD standard, so all aspects are standardized if one writes a new extension for e.g. elementary particle names. A specific app or domain can of course enforce the amount of "features" (e.g. base + two extensions) that a data file must support/provide/implement.
I would suggest we hold this specific sub question, which is also already way out of scope of the original thread question, and discuss this in a follow up / VC. Personally, I would have time for a VC e.g. the week after next week since we have an important lab evaluation during the next week (today was a German holiday for me and Friday you probably all have thanksgiving).
@ax3l @DavidSagan I agree that a VC will be an efficient way to clarify things.
@ax3l @jlvay Sounds good to me. I will be available the week after next. -- Cheers, David
That's fantastic, shall we doodle a time that fits all, probably around 6pm Berlin / 5pm London / 9am San Francisco?
I just saw:
I suspect that the only reason why you have not run into trouble before now is that specifying the mass and charge has been good enough.
mass and charge are both only defined in the ED-PIC
extension and unknown in the base standard.
@ax3l
That's fantastic, shall we doodle a time that fits all, probably around 6pm Berlin / 5pm London / 9am San Francisco?
Good for me. I've never setup a doodle before so do you want to do it?
mass and charge are both only defined in the ED-PIC extension and unknown in the base standard.
Oops! My mistake. Thanks for the correction.
@ax3l @jlvay @DavidSagan See your email ; I suggested a date.
Right now meshesPath and particlePath are parameters that are to be set to the relative path to the meshes and particle data respectively. For example: meshesPath = fields/
For simplicities sake, I propose eliminating these two parameters and simply standardize the relative path to the mesh be fields/ and the relative path to the particles be particles/
Comments?