Closed himikof closed 7 months ago
@plebihan29n Pol do you have any comments on this observation?
I would be interested to understand if this issue also affects the cells generated in other tools like CARIS.
I agree with Himikof's remarks about the definition of fields. Current definitions of some fields in Test DataSet Iso8211 Headers are not logical. Thanks Himikof for your detailed and clear description. But these current iso8211 field definitions are still consistent with S-100_5.1.0_Final_Clean.pdf October 2023. Himikof , could be help me to find the version of S100 documents that you refer ? The fact to replace '{' by '(' seems also to me more consistent with iso8211 uses but perhaps we have to wait that the advice of S100 WG ? In any cases, I will correct iso8211 header of test datasets in consequence.
Tom, the problem we have met with cells generated is not a problem of iso8211 field definitions. This subject is that the field "ITCS" is correctly announced in the header , and correctly encoded in the data in a point of iso8211 view but its content was empty. These kind of fields (empty fields) were rejected by some tools and accepted by others.
But these current iso8211 field definitions are still consistent with S-100_5.1.0_Final_Clean.pdf October 2023. Himikof , could be help me to find the version of S100 documents that you refer?
Do you mean that the definitions in the test dataset are consistent with S-100? I used https://iho.int/uploads/user/pubs/standards/s-100/S-100_5.1.0_Final_Clean.pdf for the reference. For example, in the section 10a-5.2.2 "Information Association field structure" it has:
Data Descriptive Field
3600;&%/GInformation□Association▲RRNM!RRID!NIAC!NARC!IUIN\\*NATC!ATIX!PAIX
!ATIN!ATVL▲(b11,b14,2b12,b11,{3b12,b11,A})▼
So the repeating subgroup is marked there.
Sorry you are right I didn't look at the good place => It is clear now on my side. => I will (re)generate the datasets in consequence
I have take the opportunity of issue 74 (Surrounding Depth On Dangers That Overlap InterTidal Area) to update DataSetDescriptive Record of PowerUp datasets. If possible Could you please check and confirm it is OK ? (and if OK , I will correct all test datasets) Thank you in advance. Pol
@plebihan29n Firstly, there is a certain typo in the C3IL field definition (missing the last closing parenthesis):
3100;& 3-D Integer Coordinate List▲VCID\*YCOO!XCOO!ZCOO▲(b11,{3b24}▼
Secondly, I see that you have used literal {}
characters everywhere, instead of ()
. I still do not think that it is valid to use them according to ISO/IEC 8211:1994. I have never seen any reference for that other than the DDF examples in the S-100 document, and, unfortunately, it does not offer any explanation of the format controls notation/semantics, simply referring to the ISO 8211 standard for that part. Given that, I assume that it is an unfortunately undocumented stylistic/rendering choice in S-100, not a literal requirement. I also have never seen a ISO 8211 parser capable of accepting braces in the format controls.
The ISO 8211 standard text has the following paragraph:
The order of fields and their types specified with format controls shall correspond to the data field when the format controls shall be traversed from left to right expanding the nested terms from the left. If the data field is not exhausted, the format shall repeat from the left-hand parenthesis corresponding to the next to the last right-hand parenthesis not including those parentheses used to delimit field width and using the associated repetition factor if any. If there is no such right-hand parenthesis, format control shall revert to the first left parenthesis of the format specification.
It unambiguously referrers to parenthesis only when determining which part of the format controls repeats indefinitely.
For this repo, the question is whether the datasets should conform to the current S-101/S-100 specs, or if the data should be "corrected".
In my opinion, test datasets should reflect the spec as written so that problems such as this one can be discovered. The underlying issue should be addressed at the S-100 level (thankfully, S-101 doesn't duplicate the explicit formatting information for the data descriptive fields).
Recommend this issue is marked invalid
and closed.
To be completely clear, I do not advocate for changing the test datasets to be not in line with S-101/S-100 specs. I wanted to ensure that the existing inconsistencies with specs are resolved, and, additionally, to clarify the question of their compliance to the underlying format standard (as I assumed {}
part was not literal in S-100 spec). If the second part needs to be resolved as a part of S-100 process, that is completely fine, and should not be part of this issue. But fully changing the test dataset to match the existing S-100 DDF definitions should be worthwhile regardless, as there is a present mismatch.
@plebihan29n Thank you for updating the PowerUp dataset! I see no issues with format controls there now, but there is a new problem with backslash usage in the vector labels:
RCNM!RCID!ENSP!ENED!PRSP!PRED!PROF!DSNM!DSTL!DSRD!DSLG!DSAB!DSED\*DSTC
should have two literal backslashes (according to S-100 and ISO 8211)
RCNM!RCID!ENSP!ENED!PRSP!PRED!PROF!DSNM!DSTL!DSRD!DSLG!DSAB!DSED\\*DSTC
This new problem affects DSID, INAS, C3IL, FASC in the same way.
@himikof Sorry again, hope this is good now. If yes I will regenerate DDR on all S-164 datasets
@plebihan29n I've checked the DDRs for the updated Power Up dataset, and I see no problems there. Thank you for tackling this issue!
Closed but passed to S-100WG chair by S-101PT chair so that this is considered at the S-100 level.
Problem description
Concatenated fields that have a repeated table at the end are described with wrong format controls in most test dataset files. For example, C3IL field in the "2.1.1 Power Up" 10100AA_X0000.000 has the following definition in the DDR:
It does not mark the 3b24 format control as a repeated group, meaning that according to ISO 8211 the data in this field should be decoded as a sequence of b11, b24, b24, b24, b11, b24, b24, b24, ... instead of the intended b11, b24, b24, b24, b24, ... The S-100 standard version 5.1.0 instead has this description:
I do not know if the use of
{}
in the standard text is meant to be taken literally, or is just a stylistic choice to mark the indefinitely-repeated group. I'm pretty sure that only()
parentheses are allowed in the ISO 8211 format control strings. There is also no need to mark the indefinitely-repeated group in a special way, since there always have been a clear ISO 8211 rule that the innermost right (starting from the last(
) pair of parentheses is the indefinitely-repeated group. So I would expect the exact proper format control string in this case to be(b11,(3b24))
.Problematic S-100 fields
All possible instances can be found by searching for the
\\*
literal text in the S-100 standard text or the data files themselves.RRNM!RRID!NIAC!NARC!IUIN\\*NATC!ATIX!PAIX!ATIN!ATVL▲(b11,b14,2b12,b11,3b12,b11,A)
(invalid) Expected:RRNM!RRID!NIAC!NARC!IUIN\\*NATC!ATIX!PAIX!ATIN!ATVL▲(b11,b14,2b12,b11,(3b12,b11,A))
RCNM!RCID!ENSP!ENED!PRSP!PRED!PROF!DSNM!DSTL!DSRD!DSLG!DSAB!DSED\\*DSTC▲(b11,b14,7A,A(8),3A,b11)
(invalid) Expected:RCNM!RCID!ENSP!ENED!PRSP!PRED!PROF!DSNM!DSTL!DSRD!DSLG!DSAB!DSED\\*DSTC▲(b11,b14,7A,A(8),3A,(b11))
VCID\\*YCOO!XCOO!ZCOO▲(b11,3b24)
(invalid) Expected:VCID\\*YCOO!XCOO!ZCOO▲(b11,(3b24))
VCID\\*YCOO!XCOO!ZCOO▲(b11,{3b24})
should beVCID\\*YCOO!XCOO!ZCOO▲(b11,{3b48})
instead.YCOO!XCOO!DRVO!\\*YDRV!XDRV▲(2b48,b11,2b48)
should beYCOO!XCOO!DRVO!\\*YDRV!XDRV▲(2b48,b11,{2b48})
instead.3600;&□□□2-D□Derivative□List□Integer▲YCOO!XCOO!DRVO!\\*YDRV!XDRV▲(2b24,b11,2b24)▼
should be instead3100;&□□□2-D□Derivative□List□Integer▲YCOO!XCOO!DRVO!\\*YDRV!XDRV▲(2b24,b11,{2b24})▼
RRNM!RRID!NFAC!NARC!FAUI\\*NATC!ATIX!PAIX!ATIN!ATVL▲(b11,b14,2b12,b11,3b12,b11,A)
(invalid) Expected:RRNM!RRID!NFAC!NARC!FAUI\\*NATC!ATIX!PAIX!ATIN!ATVL▲(b11,b14,2b12,b11,(3b12,b11,A))
Impact
A generic ISO 8211 reader would fail to parse (or produce garbage from) the field data of the affected fields if the last part is repeated more than once, as it would have to assume the wrong format control group to repeat, therefore assigning incorrect data types for the following data subfields.