iho-ohi / S-57-to-S-101-conversion-sub-WG

26 stars 3 forks source link

Discussion of INFORM transformations #4

Open kusala9 opened 3 years ago

kusala9 commented 3 years ago

The issue of converting INFORM is a large one and probably warrants some broad discussion. Things to look at:

INFORM could potentially be used to populate/define new features or attribution but may need stripping out of S-57 cells to avoid cluttering the display.

This requires DCEG and UOC expert guidance and input, hence this Issue being raised.

Christian-Shom commented 3 years ago

Many instances of INFORM (and NINFOM) (those less than 300 characters) could have been transfered into text sub-attribute of the complex attribute information ... but very few object allow it and Information types will have to be created: Was this made on purpose? We'll have to create Information types for a few words only. In some cases, INFORM will trigger the creation of a specific S-101 object (Discoloured Water). We need to test and text again...

TDYCARHugh commented 3 years ago

This is an interesting point. When I originally proposed shared attributes ( information types) they were meant to be a place to hold information that is common/shared among a set of features or objects like chart notes. Somewhere along the line it seems that perhaps the S-101 DCEG/FC was rationalized to put all information into information types instead of using local attributes. I think that information that is specific to a feature should be carried as a local attribute and avoid the complexity of using Information types for that case.

Christian-Shom commented 3 years ago

Fully agree with Hugh !

kusala9 commented 3 years ago

but right now we have nowhere to put such information. If the current regime continues (which seems likely), most INFORM values will be converted to NauticalInformation but a subset should trigger creation of new attributes/features, e.g. the above: Fibre Optic cables is another, units of measure is one for (I think CURVEL). Need to list these out from the current UOC and use them as a start point for the guidance. Spelling/formatting may be an issue (e.g. US use "Fiber Optic" which means they will need to parse it slightly differently to make the mapping work.

Annette-BSH commented 3 years ago

Results_INFORM_DE.pdf Here are the results of a quick analysis of our data concerning INFORM. There are some issues that need to be addressed before migrating to S-101 (e.g. encoding seasonal buoyage). Please refer to the attached PDF for more details.

Annette-BSH commented 3 years ago

Purpose of separate Information Types

There are some aspects of using separate information types over local attributes that still concern me. As far as I understand, the reason for this approach is a more efficient way of storing data and an easier way of updating data.

In my opinion this only works if the information type contains data that is shared by all geo features linked to it equally (meaning: it is not shared because it is some standard data but instead it is shared because of a specific reason). It is difficult to describe what I mean, so let me give you an example:

There are three lock gates (GATCON, CATGAT=4) at three different locations (far apart from each other) that operate on the exact same schedule at the time of data migration. Would they all be linked to the same instance of ‚service hours‘ information type? If so, that would cause severe problems when one of these lock gates changes its schedule. In this case it would not work to simply change the data in the linked information type, as this would also change the data for the two other lock gates.

There are however cases where the centralised update of data within a separate information type will help preventing errors that could occur by omitting geo features when applying changes to local attributes. The text description files (TXTDSC) come to mind here. But they will not use a separate information type.

I do understand the idea of breaking complex attributes from the feature classes to keep them slim. But there are other complex parts like topmarks (TOPMAR) that will be incorporated in several other feature classes although they (in many causes) could be shared by several geo features (even across several feature classes).

From my perspective, additional information types will only be useful when populated with standard or default data that is valid for geo features based on a common property (e.g. location). As soon as there are individual reasons for assigning standard or default data or the information type is populated with individual data values the whole concept of separate information types does not offer an advantage over local attributes. On the contrary, it could cause severe problems when updating the data without taking into account that the update may not apply to all linked geo features.

Maybe I am missing something and everything will work just fine. What is your opinion on this?

LizHahessy commented 3 years ago

Results INFORM_DK.pdf

I have attached the initial results of the DK investigation into INFORM. We are only just looking at how to resolve these items but it is clear we have some existing encoding issues that need to be addressed. Initial review suggests that many of these INFORM captures are duplication and legacy attribution from paper chart digitisation into ENC.

Christian-Shom commented 3 years ago

Results_INFORM_FR.pdf Here attached, the results for French ENCs. Analysis still to be completed.

kusala9 commented 3 years ago

in advance of the next conversion meeting I'm trying to summarise the discussion. It strikes me there's two issues.

  1. whether an included complex attribute to hold INFORM attribution should be included in every geographic feature (specifically for those INFORM values which are explicitly NOT shared) as well as NauticalInformation features which are shared within the dataset. Then the encoder can decide whether or not to use them.
  2. If (1) is implemented by the ENC product spec, how do we flag some INFORM values as shared and others as not shared? So, the schedule example is a good one. Some features could be identical but not required to be shared.

I think this could be achieved by individual converter technologies and I don't believe there's a "best practice" for this. Basically a converter may offer the options to :

  1. Convert all identical INFORM values to a single Nauticalnformation feature
  2. Convert all identical INFORM values to individual (but identical) NauticalInformation features
  3. A combination of (1) or (2) depending on the source feature and its attribution.

The other thing to note is that individual data producers may wish to make policy in this area and then implement that policy either via automated means or via post-processing of conversion....

I suggest we discuss it along these lines and come up with words in the conversion document which reflects this approach. If we wish to go down the route of inline INFORM-type attributes on all features then a (separate) proposal to S-100PT would need to be made.

JeffWootton commented 3 years ago

The DCEG Sub-Group in developing Edition 1.0.0 of the S-101 DCEG determined that providing a single method of encoding additional information relevant to feature(s) would result in more consistent encoding, less risk of error and a more consistent outcome for the end user. There is also the issue of allowing for "the same thing to be encoded in more than one way" which has been a cornerstone philosophy in the consideration of decisions in data modelling for both S-57 and S-101.

However, based on the very good discussion points made above and the fact that there is now some testing and analysis of converted data starting to be done I suggest the following:

TDYCARHugh commented 3 years ago

By design, the intention of Information types are for shared or possibly shared content. One of the drivers for them was to reduce the need for support files or having multiple references to the same support files such as for regulations or contact details. In any case they were meant to reduce duplicate content and related maintenance issues.

Re two different ways of doing the same thing: I don't see Information structures directly on Features and Information structures in information types being two ways of doing the same thing. If the information is specific to the feature it should be encoded on the feature, if the information is shared or common information then it should be encoded using the information type to avoid redundant encoding and maintenance of that information. By only allowing information as information types I would argue that we are providing only one way to encode two different things.

Re allowing duplicate Information types: I don't see the point of having or allowing duplicate information types. At the moment where the information is not common anymore then a new information type instance should be created to carry the variant information that is applicable to a different set of features. If there are cases where information is specific to a feature then it should be possible to encode it as attributes on the feature.