topdownproteomics / ProteoformNomenclatureStandard

ProForma, a Proteoform Notation Standard
https://topdownproteomics.github.io/ProteoformNomenclatureStandard/
4 stars 5 forks source link

info key: do not allow square brackets #20

Closed sgibb closed 7 years ago

sgibb commented 7 years ago

Currently the standard for the info key doesn't allow pipe characters:

Descriptors may not contain the pipe character.

I would suggest to forbid square brackets as well, e.g.

Descriptors may not contain the pipe character nor square brackets.

Otherwise parsing the tags would be more difficult. An alternative would be to surround the info description by quotation marks, e.g. SEQ[info: "Some comment about this amino acid[X]"]UENCE

stefanks commented 7 years ago

It's an unnecessary restriction, since parsing could be done by requiring the first bracket to be closed, and, necessarily the inner brackets would have to be closed as well. We need to allow those, because Unimod has cases with brackets.

The only corner-case is a weird description where a ] character occurs without a preceding [ , but this seems to be improbable enough not to merit a separate rule.

What does everyone think?

hollenstein commented 7 years ago

You could allow square brackets only if they occur as a pair in the proper order "[" "]". Although this is a corner-case it would break any parser that relies on a proper hierarchy of opening and closing square brackets.

stefanks commented 7 years ago

Yes, I believe this is reasonable

sgibb commented 7 years ago

Ok, I am happy with this solution. You could close this issue if you like.