libAtoms / extxyz

Extended XYZ specification and parsers
MIT License
12 stars 4 forks source link

Ambiguity in the grammar between quoted strings and backward compatible arrays #4

Closed Luthaf closed 2 years ago

Luthaf commented 2 years ago

For example how should software interpret the value in key="a b c"? It could be either a string containing multiple spaces a b c or a backward compatible array of three strings (["a", "b", "c"]).

I think the easiest resolution of this ambiguity is to limit quoted arrays to only integer/real/boolean values, the same way '{' arrays are limited to string values. All the extxyz files I've seen in the wild use arrays in this way, so backward compatibility should be preserved. Do you agree @jameskermode?

noambernstein commented 2 years ago

I believe a string is the intended behavior. There are no old style arrays of strings, I think. Is this actually contradicted by some documentation or by the reference implementation?

Luthaf commented 2 years ago

The 1D-array section (https://github.com/libAtoms/extxyz#one-dimensional-array-vector) of the readme says

backward compatible: opens with " or {, one or more of the same primitive types (strings only in {}) separated by whitespace, ends with matching " or }. For backward compatibility, a single element backward compatible array is interpreted as a scalar of the same type.

And "primitive types" allow strings, so that's how I understood this section. The pyleri grammar does not allow string https://github.com/libAtoms/extxyz/blob/7ad3758400b7476ea7510d41333faa11ec0f0fc7/grammar/extxyz_kv_grammar.py#L60 so I guess the main thing to update is the documentation!

Luthaf commented 2 years ago

Although, looking at the code the next line in pyleri grammar says

https://github.com/libAtoms/extxyz/blob/7ad3758400b7476ea7510d41333faa11ec0f0fc7/grammar/extxyz_kv_grammar.py#L61

while the doc says only string values are allowed inside { arrays, so there is some inconsistency here.

bernstei commented 2 years ago

No, the intention the opposite of what I think you're saying - string arrays are only allowed in {} (not "only string values are allowed in {}"), i.e. string arrays are not allowed in "". I agree that this could be more clear, so I'll try to improve the wording.

bernstei commented 2 years ago

Please take a look at the revised wording, tell me if it's more clear, and if you agree that it correctly communicates what the pyleri grammar actually implements.

Luthaf commented 2 years ago

The revised wording is much clearer, thanks!