pengowen123 / cge

An implementation of the CGE neural network encoding, written in Rust.
Apache License 2.0
3 stars 2 forks source link

Portable CGE Format #6

Closed wbrickner closed 2 years ago

wbrickner commented 2 years ago

I was thinking we should pick a forwards-compatible format that is flexible enough to store any results of an eant2 evolution, is pretty, is human readable (up to the format user chooses to output as), and is not too verbose.

What do you think of (encoded to JSON for example):

{
  "version": 1,

  "metadata": {
    "fitness": 0.04521971,
    "description": "Optional description. What the network does, input/output scheme.",
  },

  "network": { 
    "activation": "relu",
    "genome": [
      { "kind": "input", "id": 0, "weight": 1.3 },
      { "kind": "neuron", "id": 1, "weight": 0.3, ... },
      ...
    ]
  },
  "auxiliary": [ 0.436, 0.221, 1.829 ]
}

Pseudocode implementation

This can all be compartmentalized in a separate module & keep the rest of cge mostly unchanged, in pseudocode:

#[derive(Serialize_repr, Deserialize_repr)]
#[serde(tag = "version")]
#[repr(u8)]
enum PortableCGE {
  V1(version_one::VersionOneFormat) = 1,
  V2(version_two::VersionTwoFormat) = 2,
  // ...
}

// version_one::VersionOneFormat
struct VersionOneFormat {
  meta: version_one::Metadata,
  network: Vec<Gene>,
  // ...
}

// version_two::VersionTwoFormat
struct VersionTwoFormat {
  meta: version_two::Metadata,
  network: BTreeMap<UUID, Gene>,
  new_property: String
  // ...
}

#[derive(Serialize, Deserialize)]
struct CGE {
  meta: Metadata,
  network: Vec<Gene>,
  new_property: Option<String>,
  // ...
}

impl From<VersionOneFormat> for CGE { /* ... */ }
impl From<VersionTwoFormat> for CGE { /* ... */ }

Then use like

#[cfg(feature = "json")] let loaded: CGE = serde_json::from_bytes::<PortableCGE>(&input)?.into();
#[cfg(feature = "msgpack")] let loaded: CGE = rmp_serde::from_bytes::<PortableCGE>(&input)?.into();
// ...

This forces all data to be uniform by the time you're actually interacting with it. I'd like to find a better way to dynamically pick the correct decoder & not have to guard with feature flags (perhaps using serde_any).

Thoughts? Is this design all a big mistake?

pengowen123 commented 2 years ago

I like the idea of having a versioned format using some existing library (most likely using serde). I'm not sure about having built-in support for metadata that's not specifically related to or used by cge though. It might still be useful to store in different cases however, so perhaps the metadata can be reduced to just a description and the auxiliary, fitness, etc. fields replaced by a user-defined set of additional parameters to store:

{
  "version": 1,

  "metadata": {
    "description": "Optional description. What the network does, input/output scheme.",
  },

  "network": { 
    "activation": "relu",
    "genome": [
      { "kind": "input", "id": 0, "weight": 1.3 },
      { "kind": "neuron", "id": 1, "weight": 0.3, ... },
      ...
    ]
  },

  "extra": {
    "fitness": 0.04521971,
    "auxiliary": [ 0.436, 0.221, 1.829 ]
  }
}

The extra field would be passed as-is to the user. This keeps the core format free of unused information while also making it convenient to extend if additional data needs to be stored.

pengowen123 commented 2 years ago

Alright, the new format has been added. It's basically identical to what was proposed. There's a usage example in examples/ and several example files in test_data/. I'll probably add optional persistent state saving/loading before the 0.1 release, but other than that I don't expect any significant changes to it for now.