iho-ohi / S-101-Test-Datasets

A repository of S-101 test datasets which make available for development phases and they will be migrated to the Registry later.
22 stars 6 forks source link

Wrong ISO 8211 DDR format controls for concatenated+repeated fields #78

Closed himikof closed 7 months ago

himikof commented 7 months ago

Problem description

Concatenated fields that have a repeated table at the end are described with wrong format controls in most test dataset files. For example, C3IL field in the "2.1.1 Power Up" 10100AA_X0000.000 has the following definition in the DDR:

3100;&   3-D Integer Coordinate List▲VCID\\*YCOO!XCOO!ZCOO▲(b11,3b24)▼

It does not mark the 3b24 format control as a repeated group, meaning that according to ISO 8211 the data in this field should be decoded as a sequence of b11, b24, b24, b24, b11, b24, b24, b24, ... instead of the intended b11, b24, b24, b24, b24, ... The S-100 standard version 5.1.0 instead has this description:

3100;&□□□3-D□Integer□Coordinate□List▲VCID\\*YCOO!XCOO!ZCOO▲(b11,{3b24})▼

I do not know if the use of {} in the standard text is meant to be taken literally, or is just a stylistic choice to mark the indefinitely-repeated group. I'm pretty sure that only () parentheses are allowed in the ISO 8211 format control strings. There is also no need to mark the indefinitely-repeated group in a special way, since there always have been a clear ISO 8211 rule that the innermost right (starting from the last () pair of parentheses is the indefinitely-repeated group. So I would expect the exact proper format control string in this case to be (b11,(3b24)).

Problematic S-100 fields

All possible instances can be found by searching for the \\* literal text in the S-100 standard text or the data files themselves.

Impact

A generic ISO 8211 reader would fail to parse (or produce garbage from) the field data of the affected fields if the last part is repeated more than once, as it would have to assume the wrong format control group to repeat, therefore assigning incorrect data types for the following data subfields.

TomRichardson6 commented 7 months ago

@plebihan29n Pol do you have any comments on this observation?

I would be interested to understand if this issue also affects the cells generated in other tools like CARIS.

plebihan29n commented 7 months ago

I agree with Himikof's remarks about the definition of fields. Current definitions of some fields in Test DataSet Iso8211 Headers are not logical. Thanks Himikof for your detailed and clear description. But these current iso8211 field definitions are still consistent with S-100_5.1.0_Final_Clean.pdf October 2023. Himikof , could be help me to find the version of S100 documents that you refer ? The fact to replace '{' by '(' seems also to me more consistent with iso8211 uses but perhaps we have to wait that the advice of S100 WG ? In any cases, I will correct iso8211 header of test datasets in consequence.

Tom, the problem we have met with cells generated is not a problem of iso8211 field definitions. This subject is that the field "ITCS" is correctly announced in the header , and correctly encoded in the data in a point of iso8211 view but its content was empty. These kind of fields (empty fields) were rejected by some tools and accepted by others.

himikof commented 7 months ago

But these current iso8211 field definitions are still consistent with S-100_5.1.0_Final_Clean.pdf October 2023. Himikof , could be help me to find the version of S100 documents that you refer?

Do you mean that the definitions in the test dataset are consistent with S-100? I used https://iho.int/uploads/user/pubs/standards/s-100/S-100_5.1.0_Final_Clean.pdf for the reference. For example, in the section 10a-5.2.2 "Information Association field structure" it has:

Data Descriptive Field
3600;&%/GInformation□Association▲RRNM!RRID!NIAC!NARC!IUIN\\*NATC!ATIX!PAIX
!ATIN!ATVL▲(b11,b14,2b12,b11,{3b12,b11,A})▼

So the repeating subgroup is marked there.

plebihan29n commented 7 months ago

Sorry you are right I didn't look at the good place => It is clear now on my side. => I will (re)generate the datasets in consequence

plebihan29n commented 7 months ago

I have take the opportunity of issue 74 (Surrounding Depth On Dangers That Overlap InterTidal Area) to update DataSetDescriptive Record of PowerUp datasets. If possible Could you please check and confirm it is OK ? (and if OK , I will correct all test datasets) Thank you in advance. Pol

himikof commented 7 months ago

@plebihan29n Firstly, there is a certain typo in the C3IL field definition (missing the last closing parenthesis):

3100;&   3-D Integer Coordinate List▲VCID\*YCOO!XCOO!ZCOO▲(b11,{3b24}▼

Secondly, I see that you have used literal {} characters everywhere, instead of (). I still do not think that it is valid to use them according to ISO/IEC 8211:1994. I have never seen any reference for that other than the DDF examples in the S-100 document, and, unfortunately, it does not offer any explanation of the format controls notation/semantics, simply referring to the ISO 8211 standard for that part. Given that, I assume that it is an unfortunately undocumented stylistic/rendering choice in S-100, not a literal requirement. I also have never seen a ISO 8211 parser capable of accepting braces in the format controls.

The ISO 8211 standard text has the following paragraph:

The order of fields and their types specified with format controls shall correspond to the data field when the format controls shall be traversed from left to right expanding the nested terms from the left. If the data field is not exhausted, the format shall repeat from the left-hand parenthesis corresponding to the next to the last right-hand parenthesis not including those parentheses used to delimit field width and using the associated repetition factor if any. If there is no such right-hand parenthesis, format control shall revert to the first left parenthesis of the format specification.

It unambiguously referrers to parenthesis only when determining which part of the format controls repeats indefinitely.

DavidGrant-NIWC commented 7 months ago

For this repo, the question is whether the datasets should conform to the current S-101/S-100 specs, or if the data should be "corrected".

In my opinion, test datasets should reflect the spec as written so that problems such as this one can be discovered. The underlying issue should be addressed at the S-100 level (thankfully, S-101 doesn't duplicate the explicit formatting information for the data descriptive fields).

Recommend this issue is marked invalid and closed.

himikof commented 7 months ago

To be completely clear, I do not advocate for changing the test datasets to be not in line with S-101/S-100 specs. I wanted to ensure that the existing inconsistencies with specs are resolved, and, additionally, to clarify the question of their compliance to the underlying format standard (as I assumed {} part was not literal in S-100 spec). If the second part needs to be resolved as a part of S-100 process, that is completely fine, and should not be part of this issue. But fully changing the test dataset to match the existing S-100 DDF definitions should be worthwhile regardless, as there is a present mismatch.

himikof commented 7 months ago

@plebihan29n Thank you for updating the PowerUp dataset! I see no issues with format controls there now, but there is a new problem with backslash usage in the vector labels:

RCNM!RCID!ENSP!ENED!PRSP!PRED!PROF!DSNM!DSTL!DSRD!DSLG!DSAB!DSED\*DSTC

should have two literal backslashes (according to S-100 and ISO 8211)

RCNM!RCID!ENSP!ENED!PRSP!PRED!PROF!DSNM!DSTL!DSRD!DSLG!DSAB!DSED\\*DSTC

This new problem affects DSID, INAS, C3IL, FASC in the same way.

plebihan29n commented 7 months ago

@himikof Sorry again, hope this is good now. If yes I will regenerate DDR on all S-164 datasets

himikof commented 7 months ago

@plebihan29n I've checked the DDRs for the updated Power Up dataset, and I see no problems there. Thank you for tackling this issue!

TomRichardson6 commented 7 months ago

Closed but passed to S-100WG chair by S-101PT chair so that this is considered at the S-100 level.