pds-data-dictionaries / PDS4-LDD-Issue-Repo

Issue repository for tracking all PDS4 Discipline Dictionary-related issues, new feature requests, and releases.
Apache License 2.0
2 stars 1 forks source link

[ldd-img] sample_bit_mask is not correctly defined (and needs a rule) #158

Open thareUSGS opened 3 years ago

thareUSGS commented 3 years ago

Issue Type invalid usage.

Describe the issue identified (if applicable) The use of sample_bit_mask is currently being used in images that are floating-point which is nonsensical. Somehow need to check for that (schema or software, not sure yet). Also need to update the definition to specify "unsigned" Integer

https://github.com/pds-data-dictionaries/ldd-img/blob/70756f04d6d3a6095d77dca04f5af720d200cfcc/src/PDS4_IMG_IngestLDD.xml#L3128

Describe the solution you'd like definition updated and hopefully a schema rule to no not allow sample_bit_mask for signed or floating-point types.

PDS4 IM Version 1.H.0.0

Need-by Date August 2021

Additional context correct usage: https://hirise-pds.lpl.arizona.edu/PDS/EDR/PSP/ORB_001300_001399/PSP_001330_1395/ incorrect:

thareUSGS commented 3 years ago

update definition in 1.8.5.0 but still need to add in a rule (still not sure how yet).

acraugh commented 3 years ago

@thareUSGS, the Schematron rule to check the data type of a column with a sample bit mask is similar to rules I've written for the Spectral Dictionary to do contextual sanity-checking. If you need a hand with this, let me know.

acraugh commented 3 years ago

@thareUSGS, I need some clarification. The definition of "sample_bit_mask" is:

The sample_bit_mask attribute Specifies the active bits in a sample. Any bit mask is valid in an non-raw product. Any 8-bit product, whether a scaled raw product or other, will have the value `2#11111111` and be stored in one byte. Any 12-bit product, whether an unscaled raw product, or an ILUT partially-processed product (see companding_method), will have the value `2#0000111111111111` and be stored in two bytes. A 15-bit product (e.g. Radiometrically-corrected Calibrated product type) will have the value `2#0111111111111111` and be stored in two bytes. Any 32-bit integer product (e.g. Histogram Raw product) will have the value `2#11111111111111111111111111111111` and be stored in four bytes. For floating-point data, sample_bit_mask is not valid and may be absent. If present, it should be ignored. NOTE: In the PDS, the domain of sample_bit_mask is dependent upon the currently-described value in the sample_bits attribute and only applies to integer values.

Is it desirable to enforce these exact constraints? I ask because it looks like there are no constraints defined at all on the content of in the Imaging dictionary, and the definition above suggests that it really ought to have a Permissible Values list. Schema validation on the bit mask values is highly desirable and will make for a much more efficient Schematron check on type correspondence.

acraugh commented 3 years ago

@thareUSGS, I've hit a wall. The Imaging dictionary is not sufficiently constrained logically to enable any sort of serious error checking. The fundamental constraint is missing - there is no requirement to use the <Imaging> class, which is the only class that even allows identification of a relevant data object, let alone requiring it.

The problem is not soluble, and this dictionary has substantially bigger problems.

thareUSGS commented 3 years ago

@acraugh Like geometry this dictionary has grown overwhelmingly large and complicated. I have only been working in it for a ~year and also recognized it could use some serious cleaning and updates. I mean there are still no docs, hardly any rules, no tests...! I am happy to hear a way forward, perhaps a tiger-team, but it obviously won't be a simple update. As everyone who had worked on it originally and in recent years are no longer working for us (or in the PDS), bringing someone onboard to help is a priority for us, but the learning curve will be steep. Thus simply beginning the process for collecting targeted issues here would be something to start with.

Specifically for sample_bit_mask, yes there should be a finite set to test against. For the definition do you have any suggestions to help clarify it?

rgdeen commented 3 years ago

Those are meant to be examples, not proscriptive. Any binary number should be allowed technically, although I’m not entirely sure what we’d make of embedded zeros. The definition could and perhaps should be adjusted to make clear those are examples. Trent?

-Bob

Sent from my iPad

On Jul 1, 2021, at 7:17 AM, Anne Raugh @.***> wrote:

 @thareUSGS, I need some clarification. The definition of "sample_bit_mask" is:

The sample_bit_mask attribute Specifies the active bits in a sample. Any bit mask is valid in an non-raw product. Any 8-bit product, whether a scaled raw product or other, will have the value "2#11111111" and be stored in one byte. Any 12-bit product, whether an unscaled raw product, or an ILUT partially-processed product (see companding_method), will have the value "2#111111111111" and be stored in two bytes. A 15-bit product (e.g. Radiometrically-corrected Calibrated product type) will have the value "2#111111111111111" and be stored in two bytes. Any 32-bit integer product (e.g. Histogram Raw product) will have the value "2#11111111111111111111111111111111" and be stored in four bytes. For floating-point data, sample_bit_mask is not valid and may be absent. If present, it should be ignored. NOTE: In the PDS, the domain of sample_bit_mask is dependent upon the currently-described value in the sample_bits attribute and only applies to integer values.

Is it desirable to enforce these exact constraints? I ask because it looks like there are no constraints defined at all on the content of in the Imaging dictionary, and the definition above suggests that it really ought to have a Permissible Values list. Schema validation on the bit mask values is highly desirable and will make for a much more efficient Schematron check on type correspondence.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe.

rgdeen commented 3 years ago

Anne, what does that even mean? (“I’ve hit a wall” comment).

The imaging dictionary provides - as all discipline dictionaries should - a grab-bag of things that can be used to describe items from its discipline (images, in this case). Virtually nothing is required, because we have no idea what the capabilities of any given imaging instrument are. That is simply good DD design. It is up to the missions to define what elements should be required for their own mission - that is not something the discipline dictionary should do.

I fail to see what the problem is with the img DD as a whole. It’s broken down into a bunch of classes that describe aspects of imaging instruments. It’s large, but so what? Imagers are a diverse lot. You use the portions you need and ignore the rest. Same with geom… it’s big, but again, so what? Use the parts you need, even if it’s just one class.

Menus at restaurants are large. You pick the 2 or 3 items you want and ignore the rest. Same with discipline DD’s… it’s a menu of items from which to choose. No one instrument should ever use it all! That is how it should be.

-Bob

Sent from my iPad

On Jul 1, 2021, at 8:09 AM, Anne Raugh @.***> wrote:  @thareUSGS, I've hit a wall. The Imaging dictionary is not sufficiently constrained logically to enable any sort of serious error checking. The fundamental constraint is missing - there is no requirement to use the class, which is the only class that even allows identification of a relevant data object, let alone requiring it.

The problem is not soluble.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe.

acraugh commented 3 years ago

"I've hit a wall" means I cannot continue to move forward on the path I was planning to take because there is an obstruction that makes the route impassable, which is that the dictionary is not sufficiently constrained to make the desired check logically feasible. The restaurant metaphor is not particularly apt, because the selections in a restaurant are made by a consumer who is invited to exercise his preferences and is served by a kitchen designed to be able to respond to ad hoc customization requests not anticipated in advance. A namespace is more akin to an assembly line product, where there are variations possible on the output product in terms of options and capabilities, but there are basic requirements that all modifications have to plug in to the same chassis. The Imaging Dictionary has no requirement for a chassis. Without that basic existential requirement on the constraining context, you can do very little validation, and in particular not the validation required to address the problem here.

Most dictionaries are not "grab-bags". They are highly structured namespaces, as is the core of the Information Model itself - more like Gundams than Legos.

rgdeen commented 3 years ago

It’s a perfectly apt metaphor!! The types of validation you seem to be looking for should be provided at the mission level… not the discipline. It’s for the project to decide which parts of the menu are applicable and define rules (if they so choose) to enforce that. It’s not for the discipline to decide e.g. “all cameras must have an optical filter”… because not all cameras do. But if they DO have an optical filter, here are recommended ways to describe it.

I guess I don’t know what kind of validation you’re trying to accomplish here…

-Bob

Sent from my iPhone

On Jul 12, 2021, at 7:37 AM, Anne Raugh @.***> wrote:



"I've hit a wall" means I cannot continue to move forward on the path I was planning to take because there is an obstruction that makes the route impassable, which is that the dictionary is not sufficiently constrained to make the desired check logically feasible. The restaurant metaphor is not particularly apt, because the selections in a restaurant are made by a consumer who is invited to exercise his preferences and is served by a kitchen designed to be able to respond to ad hoc customization requests not anticipated in advance. A namespace is more akin to an assembly line product, where there are variations possible on the output product in terms of options and capabilities, but there are basic requirements that all modifications have to plug in to the same chassis. The Imaging Dictionary has no requirement for a chassis. Without that basic existential requirement on the constraining context, you can do very little validation, and in particular not the validation required to address the problem here.

Most dictionaries are not "grab-bags". They are highly structured namespaces, as is the core of the Information Model itself - more like Gundams than Legos.

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://urldefense.us/v3/__https://github.com/pds-data-dictionaries/PDS4-LDD-Issue-Repo/issues/158*issuecomment-878333568__;Iw!!PvBDto6Hs4WbVuu7!Y9W6hAxs--YylmUB8GXxzOZVStKPBoDDmiDf-m5sf8tcaRjvpwPMdBNDTUocSx7kt0Hp$, or unsubscribehttps://urldefense.us/v3/__https://github.com/notifications/unsubscribe-auth/AB5QEOJOM53XXFPV5EIGIX3TXL43FANCNFSM45P3DIXA__;!!PvBDto6Hs4WbVuu7!Y9W6hAxs--YylmUB8GXxzOZVStKPBoDDmiDf-m5sf8tcaRjvpwPMdBNDTUocS2pFjA71$.