the-human-colossus-foundation / oca-spec

Overlay Capture Architecture Specification
European Union Public License 1.2
8 stars 7 forks source link

Character encoding overlay ambiguity #18

Open blelump opened 1 year ago

blelump commented 1 year ago

Problem

Currently in the spec https://oca.colossi.network/specification/#character-encoding-overlay we have

{
  "capture_base":"EVyoqPYxoPiZOneM84MN-7D0oOR03vCr5gg1hf3pxnis",
  "type": "spec/overlays/character_encoding/1.0",
  "default_character_encoding":"utf-8",
  "attribute_character_encoding":{
      "photoImage":"base64"
  }
}

The underlying ambiguity is about the default_character_encoding attribute. Even though it is defined, one might do as well:

{
  "capture_base":"EVyoqPYxoPiZOneM84MN-7D0oOR03vCr5gg1hf3pxnis",
  "type": "spec/overlays/character_encoding/1.0",
  "default_character_encoding":"utf-8",
  "attribute_character_encoding":{
      "photoImage":"base64",
      "a": "utf-8",
      "b": "utf-8"
  }
}

That ends with the same result but with a different digest of the overlay itself, thus, the digest of the bundle.

Proposal

default_character_encoding shall be removed in favor of explicit types for each attribute.

mitfik commented 1 year ago

The base idea was that default_character_encoding is always overwritten by the attribute. Means if there is attribute it takes what is in attribute if there is none, default encoding is applied.

this allows to avoid specifying all attributes (e.g. in huge schema 100 attributes, you can save quite some space by just using default_character_encoding

If each time we would need to specify every signal attribute it could take unnecessary space for use cases where each byte counts. Or maybe we can ignore that and think about alternative compact representation with focus on size.

blelump commented 1 year ago

The space doesn't matter, as the current representation isn't compact. It is actually the opposite, and for compactness, having default_character_encoding does not change much.

Thus, unambiguity is the primary goal rather than compactness.

carlyh-micb commented 9 months ago

ADC does not use default character encoding and character encoding written by our OCA Composer specifies it attribute by attribute. I don't think it is something supported by the Excel Template either. I would support the removal of default_character_encoding in support of unambiguity.

neiljthomson commented 9 months ago

On Binary encoding.

For more details see Mime-Types