sofwerx / cdb2-concept

CDB modernization
0 stars 1 forks source link

Attribute Default Values #32

Open UnclePoole opened 4 years ago

UnclePoole commented 4 years ago

One ongoing challenge is how to handle missing attribute values that are needed for runtime simulation. I'd like to open this up for discussion to figure out the best approach.

Considerations: Mandatory attributes must be present in the dataset for the CDB to be valid. What happens if there is no source data for that attribute value? I don't see it as practical to say "Well, you just can't make this CDB then."

Intel users need to know which attributes are unknown/missing (and possibly filled with a default or procedural value) vs. which attributes have explicitly set values that may happen to match the default or procedural value.

Attribute dictionary currently does not specify standard default values to use for missing attributes. This is also not present in GGDM or NAS. OWT has identified this as a necessary capability to support consistent and interoperable procedural generation across different simulations and tool workflows.

Global default values inferred from min/max/average may not semantically make sense as a usable default value. The semantically appropriate default value will frequently be different for different feature types for the same attribute.

For "realistic" procedural generation a single default value may not be enough, you may want full random distributions with parameters such as distribution type, mean, standard deviation, etc.

Thoughts?

ryanfranz commented 4 years ago

A few thoughts:

UnclePoole commented 4 years ago

@ryanfranz Ohh, very interesting. I had missed that Defaults.xml had default vector attribute values. I just saw the initial raster dataset defaults and assumed that was the whole file.

I don't know if it was the design intent, but Defaults.xml frames raster coverage pixels as logical attributes in the same sense as vector attributes which from a data model standpoint is very good. Raster vs. vector vs. mesh is a geometry representation choice which should be orthogonal to entity-attribute semantics.

It looks like the Defaults.xml can define global attribute defaults as well as per-dataset defaults so that gets a lot of what I wanted. Doing per-entity defaults would be a straightforward extension that could be proposed for CDB 1.3 as a transition path.

The field isn't quite normalized and assumes a specific formatting of "Default$[Dataset]$[Attribute]" rather than explicitly stating the attribute code as a separate XML element - the dataset is already specified by a different element. Workable but a tad awkward.