openPMD / openPMD-standard

:notebook: Open Standard for Particle-Mesh Data
http://www.openPMD.org
Creative Commons Attribution 4.0 International
79 stars 29 forks source link

Species Type: Character Set after "other:" #258

Open s9105947 opened 2 years ago

s9105947 commented 2 years ago

To specify custom species, the species Type extension allows: "user are free to append a free text after a colon". Is really any text allowed? (What even is text?)

IMO "free text" should be severely restricted, at least to "printable characters without semicolon (;)". notable cases:

By gut instinct I'd suggest to allow only "universally" safe and unambigous characters after "other:":

ax3l commented 2 years ago

To specify custom species, the species Type extension allows: "user are free to append a free text after a colon". Is really any text allowed? (What even is text?) IMO "free text" should be severely restricted, at least to "printable characters without semicolon (;)".

Correct. Derived from the base standard, we always mean pure ASCII when we write text. Yes, generally we leave other: unspecified for cases we have not yet thought about and that get standardized in later versions. A semicolon is also fine in that case, maybe someone things of another compound or list and wants to use the same convention. When it gets standardized later one, we only drop the other:, making file conversions easy. Thus,; are fine here.

I would not pro-actively allow <a-type-that-we-already-define>;other:<someStuff> yet, since I think we have no use case for this yet and it just complicates the syntax & conventions. (Unless you have a concrete case you need to achieve right now, of course.)

s9105947 commented 2 years ago

Thank you for the clarification on ASCII, I overlooked that ._. I like the concept of reserving other: for entirely custom types and forbidding it from lists

So if I understand you correctly, a speciesType is:

  1. one of the pre-defined species (fundamental particle, atom, maybe ion/molecule)
  2. a list formed by multiple of those conforming to 1., separated by a single ";"
    • empty lists are forbidden
    • empty list items (containing ;;) are forbidden
    • trailing semicolons must be ignored
  3. a string beginning with other:, followed by see below

Correct, so far?

This would leave these questions:

  1. should empty strings after other: be permitted?
  2. can a newline ever follow after other:?
  3. which other characters are allowed?

I'd suggest: yes, no, class "print" of POSIX locale (IEEE 1003.1-2008, s. 7.3.1, l. 4187) 123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_abcdefghijklmnopqrstuvwxyz{|}~ (+backtick ` +space +horizontal tab)