Redesign `Gene` and `Network` types

pengowen123 commented 2 years ago

This redesigns the Gene and Network types to be cleaner and more idiomatic. It also improves the performance of network evaluation by caching subnetwork results.

cc @wbrickner How do these changes look? It will require you to change any parts of your code that handle Gene, but the new design should be cleaner overall. Some other things to note:

The number of inputs and outputs of the network are provided by methods now.
The calculation of the number of inputs is somewhat different from what you have now. It uses one plus the max ID referred to by any Input gene instead of the number of unique inputs. I thought about this some more and realized that input IDs can be skipped in a network, resulting in the same number of unique IDs but different ID values (and therefore different indices). For example, if a network only mentions inputs 2 and 3, there are two unique inputs, but upon indexing a 2-length input array a panic will occur. You can fix this by adjusting the IDs to match one-to-one with the correct indices, but this PR makes cge simply require a large enough input array to match the largest possible index ([f64; 4] in the example). This keeps part of eant2's use case simple and is also good for performance because direct indexing is used.

pengowen123 commented 2 years ago

After this PR is merged, I'll add the mutation operators, as well more extensive tests that you can duplicate for const_cge.

wbrickner commented 2 years ago

I can provide comments if you'd like in a little bit.

Let me examine the info about input output dimensionality etc, I need to think about this carefully. I think what I'm doing in const_cge is correct already (count unique IDs of input genes), but maybe not correct. Also, for the input index I believe I mimicked the original codebase in using the ID of the input gene.

wbrickner commented 2 years ago

Comments

Correctness:

I would love to see the serialization work be pushed onto serde (or something) for some better correctness guarantees using a macro/declarative serialization format (with protocol versioning ideally). Not critical now, looks to be pretty good, it's just cumbersome and a little error prone (and brittle) to be doing string manipulation manually.
is Network::validate() used anywhere? It yields Ok(()) deterministically.
I think the rewrite of evaluate is correct. It's easier to look at than before, but you will need to be the judge. I think it has at least mostly equivalent behavior to the old version I've studied more closely.

API

I see you've switched over to using new structs for all the gene types + some From magic, and some higher level info methods, I like this! This is awesome, very beautiful.
it'd be really cool if we could apply something along the lines of
- Network::save_to_file(&self, path: &str) => Network::to_file::<P: Into<Path>>(&self, path: P)
- Network::load_from_file(&self, path: &str) => Network::from_file::<P: Into<Path>>(&self, path: P)

Performance + Scaling

I wonder if we can perform the same tasks as rebuild_neuron_info in O(1) each time updates are applied? Is the scope of possible changes to the hashmap knowable a priori? May not be worth it if it's not obvious with your domain knowledge.
Question: Is the NeuronInfo struct applicable to deduplication? Can we rely on these things to build something like topological fingerprints, capturing depth and subgenome range? One of my desires long term, although it is likely to be overopt at this stage.
I think the function pointer approach to activation functions is not the best we can do, I think it'd be easier to understand + probably more performant on modern CPUs to replace value = activation.get_func()(sum_inputs) with value = activation.apply(sum_inputs). I propose a new implementation of Activation that is a little shorter and declares everything a bit more statically:

/// Represents which transfer function to use for evaluating neural networks.
#[derive(Debug, Copy, Clone, Eq, PartialEq)]
pub enum Activation {
    /// Maps input to output directly, as if there is no transfer function.
    Linear,
    /// Outputs 1 if input is greater than 0, 0 otherwise.
    Threshold,
    /// Outputs 1 if input is greater than 0, 0 if input is equal to 0, -1 otherwise. Useful
    /// for simple problems and boolean logic, as it only allows three possible output values.
    Sign,
    /// A non-linear function. This function is the most general, so it should be defaulted to.
    Sigmoid,
    Tanh,
    SoftSign,
    BentIdentity,
    Relu,
}

impl Activation {
    /// Apply this activation function
    pub fn apply(&self, x: f64) -> f64 {
        match self {
            Activation::Linear       => Self::linear(x),
            Activation::Threshold    => Self::threshold(x),
            Activation::Sign         => Self::sign(x),
            Activation::Sigmoid      => Self::sigmoid(x),
            Activation::Tanh         => Self::tanh(x),
            Activation::SoftSign     => Self::soft_sign(x),
            Activation::BentIdentity => Self::bent_identity(x),
            Activation::Relu         => Self::relu(x),
        }
    }

    fn linear(x: f64)        -> f64 { x }
    fn threshold(x: f64)     -> f64 { if x > 0.0 { 1.0 } else { 0.0 } }
    fn sign(x: f64)          -> f64 { if x > 0.0 { 1.0 } else if x == 0.0 { 0.0 } else { -1.0 } }
    fn sigmoid(x: f64)       -> f64 { 1.0 / (1.0 + (-x).exp()) }
    fn tanh(x: f64)          -> f64 { x.tanh() }
    fn soft_sign(x: f64)     -> f64 { x / (1.0 + x.abs()) }
    fn bent_identity(x: f64) -> f64 { (((x.powi(2) + 1.0).sqrt() - 1.0) / 2.0) + x }
    fn relu(x: f64)          -> f64 { if x > 0.0 { x } else { 0.0 } }

// ...

By the way, is there some issue with representing Activation indices as u8? I see they're often i32.

Great PR, hopefully some of these perspectives are useful! :)

pengowen123 commented 2 years ago

* I would love to see the serialization work be pushed onto serde (or something) for some better correctness guarantees using a macro/declarative serialization format (with protocol versioning ideally).  Not critical now, looks to be pretty good, it's just cumbersome and a little error prone (and brittle) to be doing string manipulation manually.

This is the current plan. It would be nice to retain something reasonably human-writable, but I think serde should work pretty well here.

* is `Network::validate()` used anywhere? It yields `Ok(())` deterministically.

This function was removed and worked into the rebuild_network_metadata function in a later commit.

* it'd be really cool if we could apply something along the lines of

  * `Network::save_to_file(&self, path: &str)` => `Network::to_file::<P: Into<Path>>(&self, path: P)`
  * `Network::load_from_file(&self, path: &str)` => `Network::from_file::<P: Into<Path>>(&self, path: P)`

I like this better too, I'll change it when serialization/deserialization is reworked.

* I wonder if we can perform the same tasks as `rebuild_neuron_info` in O(1) each time updates are applied? Is the scope of possible changes to the hashmap knowable a priori? May not be worth it if it's not obvious with your domain knowledge.

The best we can do here is O(n) on the number of neurons in the genome, as the entire neuron_info map must be scanned and updated. It would still be much less work than a full rebuild though, so I'll do this where possible with the mutation operators.

* **Question**: Is the `NeuronInfo` struct applicable to deduplication? Can we rely on these things to build something like topological fingerprints, capturing depth and subgenome range?  One of my desires long term, although it is likely to be overopt at this stage.

That's a good idea! The neuron_info map does seem to capture all structural information except for connections, so it would be well suited to determining structural similarity in eant2. Structural identicality should be easy to check as well.

* I think the function pointer approach to activation functions is not the best we can do, I think it'd be easier to understand + probably more performant on modern CPUs to replace `value = activation.get_func()(sum_inputs)` with `value = activation.apply(sum_inputs)`. I propose a new implementation of `Activation` that is a little shorter and declares everything a bit more statically:

/// Represents which transfer function to use for evaluating neural networks.
#[derive(Debug, Copy, Clone, Eq, PartialEq)]
pub enum Activation {
    /// Maps input to output directly, as if there is no transfer function.
    Linear,
    /// Outputs 1 if input is greater than 0, 0 otherwise.
    Threshold,
    /// Outputs 1 if input is greater than 0, 0 if input is equal to 0, -1 otherwise. Useful
    /// for simple problems and boolean logic, as it only allows three possible output values.
    Sign,
    /// A non-linear function. This function is the most general, so it should be defaulted to.
    Sigmoid,
    Tanh,
    SoftSign,
    BentIdentity,
    Relu,
}

impl Activation {
    /// Apply this activation function
    pub fn apply(&self, x: f64) -> f64 {
        match self {
            Activation::Linear       => Self::linear(x),
            Activation::Threshold    => Self::threshold(x),
            Activation::Sign         => Self::sign(x),
            Activation::Sigmoid      => Self::sigmoid(x),
            Activation::Tanh         => Self::tanh(x),
            Activation::SoftSign     => Self::soft_sign(x),
            Activation::BentIdentity => Self::bent_identity(x),
            Activation::Relu         => Self::relu(x),
        }
    }

    fn linear(x: f64)        -> f64 { x }
    fn threshold(x: f64)     -> f64 { if x > 0.0 { 1.0 } else { 0.0 } }
    fn sign(x: f64)          -> f64 { if x > 0.0 { 1.0 } else if x == 0.0 { 0.0 } else { -1.0 } }
    fn sigmoid(x: f64)       -> f64 { 1.0 / (1.0 + (-x).exp()) }
    fn tanh(x: f64)          -> f64 { x.tanh() }
    fn soft_sign(x: f64)     -> f64 { x / (1.0 + x.abs()) }
    fn bent_identity(x: f64) -> f64 { (((x.powi(2) + 1.0).sqrt() - 1.0) / 2.0) + x }
    fn relu(x: f64)          -> f64 { if x > 0.0 { x } else { 0.0 } }

// ...

Makes sense to me.

By the way, is there some issue with representing Activation indices as u8? I see they're often i32.

u8 would work as well, but I think the integer conversion will be unnecessary with serde anyways.

Great PR, hopefully some of these perspectives are useful! :)

I appreciate the feedback, thanks a lot! I'll go ahead and merge this then and implement the rest of the changes in future commits.

pengowen123 / cge