Is this crate still broken?

wbrickner commented 2 years ago

Hello!

I'm very excited about using this library, however the README claims it is in a broken state, waiting for fixes in the CMA-ES repo. I see the CMA-ES repo is more active, with many commits recently.

Is the README still accurate in saying this crate is broken?

Amazing work, thank you so much!

pengowen123 commented 2 years ago

Thanks for your interest in using the library!

Unfortunately, the crate is still broken, although you are right that it is possible to fix now that the cmaes crate is fixed. I don't feel that the code of this library is suitable for release even with those fixes, however, so there would likely be more work to be done before it is ready for release. I should have some time to look over it soon, but I can't promise any timeframe for its release at the moment.

wbrickner commented 2 years ago

Ok! I had a look at the codebase and tried porting it to the new API of cmaes last night, but I didn't have a very good understanding of cmaes or what the original code intended to do.

Not sure I can get it release-quality all by myself but I'd like to try to push it in that direction. If I fix things up a bit would you accept a PR to this end, assuming the API is well designed and there aren't any crimes being committed in the rewrite?

(Also, of course, crates.io is home to many thousands of proudly hacky crates that aren't exactly formally verified!)

pengowen123 commented 2 years ago

In terms of cmaes changes, the main things would be:

The optimize_network function should build a CMAESOptions with any relevant settings and run it.
The cmaes_runs option should be replaced with a restart strategy provided by the cmaes crate or removed entirely (I don't remember what the original paper suggested here).

IIRC there were still some bugs with the mutation system that couldn't reasonably be tested/fixed with cmaes not working, but now that that's fixed it can be revisited.

pengowen123 commented 2 years ago

But to answer your question, I'd gladly accept a PR that got everything working! Changes can always be made to improve the API and such later.

wbrickner commented 2 years ago

Ok, I will abuse this issue as a place to ask you questions in public while repairing, as I am not intimately familiar with the EANT2 algorithm as you are.

What would you say serves as a reasonable default for the restart strategy? Local, IPOP, or BIPOP?

pengowen123 commented 2 years ago

BIPOP should perform the best, although if it ends up running too slowly it can be swapped out for Local (configured with the existing cmaes_runs option).

wbrickner commented 2 years ago

The API I've designed allows external separation of the termination conditions of the EANT2 and CMA-ES algorithms.

What I'm wondering is how termination should default for the CMA-ES algorithm. The current CMA-ES API (I think) does not force the caller to declare either a maximum generation or function target. I think this means the CMA-ES will run possibly forever in the inner loop.

What do you think should be the default behavior, or how is this /supposed/ to work from the EANT2 side?

pengowen123 commented 2 years ago

The returned CMA-ES termination conditions can be ignored here (except for InvalidFunctionValue as it's more of an error). The max generations option of cmaes should be exposed in eant2, although it can be None by default because CMA-ES will generally terminate anyways. fun_target should also be exposed as it's a reasonable termination condition for EANT2 as well.

wbrickner commented 2 years ago

I have a more basic question.

I am refactoring the optimize_network function. I need to understand the intended procedure for evaluating the fitness of an individual.

cmaes provides a &DVector<f64> and expects f64 fitness back.

I can't find the original internals of cmaes_loop, so I'm not sure what's supposed to happen here. I think it must involve some kind of complicated traversal of the cge network genome buffer using the parameter buffer and some interaction with the "reinforcement learning environment". Certainly something I don't want to rebuild, is that implementation available somewhere still?

wbrickner commented 2 years ago

Actually, duh, I only need the part where the parameters are integrated with the structure encoded by the cge buffer. All loss function stuff is handled by the fitness trait.

wbrickner commented 2 years ago

would something like this have the intended effect?

.build(|parameters: &DVector<f64>| {
      // copy the parameters over into the network (not ideal), modifying the individual
      individual
        .network
        .genome
        .iter_mut()
        .zip(parameters.iter())
        .for_each(|(gene, p)| gene.weight = *p);

      individual.fitness = individual.object.get_fitness(&mut individual.network);
      individual.fitness
    });

wbrickner commented 2 years ago

Also, design choice:

Do you think it makes more sense to provide parallelism on the eant2 side or the cmaes side? I naively introduced parallelism at the eant2 side, but there already exists parallelism on cmaes side, and enabling both may actually hurt. Thoughts?

Edit: to clarify, either many threads on the eant2 side each with their own invocation of optimize_network (and a single-threaded CMA-ES invocation), or a single-threaded eant2 side which serially examines every individual and puts all the parallelism inside the CMA-ES invocation, or perhaps do both. I think both could maybe be faster, I have no idea and no measurements, but modern CPUs have neat tricks like hyperthreading / SMT which could maybe allow scheduling "too much work" to be an optimization. Wondering if you have a preference or insight.

pengowen123 commented 2 years ago

cmaes::ObjectiveFunction should be implemented for Individual, so you can just call build with &individual or &mut individual (ObjectiveFunction will have to be implemented for references to Individual, but you can just forward to the base implementation).

As for parallelism, it will likely be fastest to implement it at the eant2 level so that the entirety of CMA-ES is run in parallel rather than just the objective function. This would be worth benchmarking later though to see whether additionally enabling it in cmaes would be worth it.

wbrickner commented 2 years ago

I will write the trait implementation for eant2::Individual, as it has no such implementation now. Cleaner solution, good point. Edit: this is not possible. ObjectiveFunction provides no mechanism for accessing the a network genome. Closure it is.

It remains unclear to me, just to clarify, you favor an implementation in which many eant2 optimize_network functions run in parallel?

pengowen123 commented 2 years ago

The implementation is here (it's roughly what your closure does): https://github.com/pengowen123/eant2/blob/master/src/utils.rs#L49 The new ObjectiveFunction takes &mut self as a parameter, so you should be able to set the weights on self.network.genome instead of cloning as well.

Yes, multiple optimize_network calls should be run in parallel.

wbrickner commented 2 years ago

After thinking about it a little more, it is likely that higher performance can be unlocked by driving all system parallelism from the eant2 side. The eant2 algorithm's usage of cma-es is trivially parallel, with zero interaction between the tasks.
This may or may not be true of the cma-es internals, and I expect they are much more complex. I also favor cma-es running single-threaded & eant2 bringing all of the parallelism.

wbrickner commented 2 years ago

Ok, almost finished.

I had some questions about gene_deviations computed as 1 / (1 + age^2). These used to be provided to cma-es, but now there seems to be no way to provide them. Should they no longer be computed in EANT2?

Also, I'm seeing some cool optimization opportunities if a "network fingerprint" can be constructed, like a hash, possibly permissible for it to be variable length.

The most significant might be in the de-duplication phase, which is O(n^2) presently. I recall there may be some techniques for a compressed representation of network structure, but what that is I can't say. Any thoughts?

wbrickner commented 2 years ago

I also have some changes I want to make to network evaluation that doesn't require cloning inside the hot loop, but it requires some light changes to your other library cge / cge-core. Specifically I'd like to be able to evaluate an existing network with a slice of candidate parameters without having to copy those params in.

pengowen123 commented 2 years ago

The initial distribution cannot be adjusted in cmaes anymore, but you can scale the overall search space by the same values instead (see Scale). The returned solution must be scaled inversely to reflect this.

CGE already is a compressed format, but some sort of hash would probably still speed deduplication up. On the topic of optimization though, I would hold off on those kinds of changes for now. I would expect the vast majority of time to be spent in cmaes and, for real-world use cases, the fitness function, so it would be best to implement some benchmarks to guide optimization after the core functionality is complete.

pengowen123 commented 2 years ago

However, you are welcome to make changes to cge as well if you find it necessary. And I want to thank you for putting the time into doing this. I really appreciate it!

wbrickner commented 2 years ago

The returned solution must be scaled inversely to reflect this.

Can you clarify what you mean? Would it not be the case that the returned solution must be scaled /again/, because the return value did not pass through a Scale?

Also, is this a good default? To scale by a metric that is deterministic in gene age for each iteration? It does not appear obvious that this would work well for many problems, but I don't have expertise here.

pengowen123 commented 2 years ago

You are correct about the scaling, my mistake. The Scale type provides a method for this.

This behavior is taken from the original EANT2 paper, which stated that they observed the weights of older genes to not change much despite structural changes to the network elsewhere. Scaling the search distribution therefore avoids wasting resources by causing CMA-ES to only search a smaller range in those axes. However, it does have mechanisms to naturally increase the scales of the axes if they turn out to be more useful than assumed.

wbrickner commented 2 years ago

So I'm writing the public API for controlling RestartStrategy, and I'm a little confused.

Why must RestartStrategy know about the dimensionality of the problem / search range? Is there some way we can have this be automatically detected.

I really really like the idea of making machine learning /easy/, and the explosion of required (seemingly opaque) parameters that must be declared up front I think works in the opposite direction. I think almost everything should have a default value that "just works", if that's possible. Is configuring the RestartStrategy one such case? If so, can you help me see how?

Thank you!

Edit: I've got the public API down to:

let train = 
  EANT2::builder()
    .inputs(2)
    .outputs(1)
    .activation(Activation::Threshold)
    .restarter(...)
    .build();

let (network, fitness) = train.run(&Foo);

The last thing to make elegant AFAIK is the restarter.

pengowen123 commented 2 years ago

RestartOptions is supposed to be a complete interface to cmaes, so at the minimum it does require the problem dimension. The problem dimension in EANT2's use case is the number of connection weights in the network, which varies per network and cannot be specified up front by a user of eant2. Therefore, the RestartStrategy API will have to be duplicated to some extent to avoid specifying the dimension. I would recommend making a new RestartStrategy enum and exposing only the parameters that are directly relevant to EANT2:

enum CMAESRestartStrategy {
    BIPOP,
    IPOP,
    Local(usize),
}

BIPOP and IPOP should probably just use the default settings, so nothing needs to be exposed on the eant2 side. The max_runs option of Local might be useful to users, so it can be exposed here.

wbrickner commented 2 years ago

Ok, so using the CMAESOptions directly, there is a way to specify the "initial mean" (I assume this is the centroid of the high dimensional gaussian along each axis in parameter space) so that optimization continues on from the previously discovered acceptable point in the parameter space. (This seems like a really important fact about the old implementation!).

Using Restarter, I don't know how to do that. Here's what I've got:

let best = {
    // TODO: amortize allocation
    let initial_mean: Vec<f64> = individual.network.genome.iter().map(|g| g.weight).collect();
    let initial_step_size = 0.3;

    // TODO: avoid these clones if possible.
    let scaled = Scale::new(individual.clone(), gene_deviations.clone());

    // run the CMA-ES optimization pass
    let restarter = 
      Restarter::new(
        RestartOptions::new(
          individual.network.genome.len() + 1, // number of parameters to optimize
          -1.0..=1.0,  // ??? what interval is this? what does this even mean? this is 1D, so it's not to do with parameter space.
          options.restart.clone()
        )
        .mode(Mode::Minimize)
        .enable_printing(options.print)
        .fun_target(options.terminate.exploitation.fitness)
        .max_generations_per_run(options.terminate.exploitation.generations)
      )
      .unwrap();

    restarter.run_with_reuse(scaled).best.unwrap()
  };

Am I doing this completely wrong? I don't really understand the structure of your cmaes crate yet.

pengowen123 commented 2 years ago

The Restarter will randomly generate the initial search point for each run within the search range specified (-1.0..=1.0 in your current code). The same search range is applied for all dimensions because individual axes can be scaled with Scale, which would effectively scale the initial search range as well. This does seem confusing now that I look at it, so I might have to redesign this API for the next release.

This behavior seems to conflict with EANT2 wanting to keep the same initial search point. I will check the paper as I don't remember all the details here, but perhaps this requirement can be relaxed or the Restarter bypassed entirely in favor of directly initialized CMA-ES runs. Alternatively, Individual's implementation of ObjectiveFunction can just translate its input by the desired initial mean:

fn evaluate(x: &DVector<f64>) -> f64 {
    let initial_mean = self.network.genome.iter().map(|g| g.weight).collect();
    let x = x + initial_mean;
    ...
}

This effectively does what you want as the search space will be translated to be centered on initial_mean, and so Restarter will only be adding a random offset to this value. I might consider adding this functionality into cmaes itself because it seems useful.

I also noticed two small things in the code you posted. First, you can pass &mut individual to Scale to avoid cloning it. If it says ObjectiveFunction isn't implemented, you can just add a simple implementation for &mut Individual that calls the existing implementation. Second, the number of parameters is being set to genome.len() + 1. I see this is in the original code too, but I think it might be a bug. The extra parameters doesn't seem to be used anywhere, so it should be safe to just use genome.len().

EDIT: The initial step size is calculated based on the size of the search range, so it can't be set manually.

wbrickner commented 2 years ago

What do you think timeline looks like for investigating / fixing the potential mutation bug + potential cge evaluation bug? I can try looking into these matters on my own and submitting another PR, but I don't want to step on your toes and tbh you seem to have a much better understanding of the cge mechanics.

pengowen123 commented 2 years ago

I'll get the cge bug fixed today and do the refactoring in a separate PR. Once both of those are done I can look into the mutation issue in eant2, which I don't expect to take too long to fix. Overall, I would estimate 3-4 days but possibly sooner.

wbrickner commented 2 years ago

With the new changes in cge, should eant2 be working? I'm not sure where it stands, very excited to start using it for real once we have all the tooling in place!

pengowen123 commented 2 years ago

eant2 will just need to be updated to use the latest cge version, and then it should be fully functional. cge is just about done; I'm just finishing some minor features and then the documentation.

wbrickner commented 2 years ago

Got bored and was going to try to upgrade eant2, but I ran into an issue, so heads up you'll need to mend the cge Gene API to allow mutation of weights

pengowen123 commented 2 years ago

The only work left to do for cge is updating the documentation and such, so if you succeed in upgrading eant2 now that would be great!

It is necessary to preserve genome validity to make obtaining a &mut Gene impossible, so all the methods are provided by Network directly. You should be able to use Network::set_weights for eant2's use case though.

wbrickner commented 2 years ago

I keep trying to merge & then am confronted with the fact that I can tell I'm not seeing the bigger picture on how these are intended to fit together. Feels like a lot of bookkeeping stuff in eant2 just doesn't need to exist anymore, but I don't have a solid understanding of how it's supposed to look at the end. I think you will be better at this having written cge!

pengowen123 commented 2 years ago

No worries! I won't be home for the next few days, but I'll be back to release cge and get started on the eant2 update on Sunday.

pengowen123 commented 2 years ago

@wbrickner Alright, cge 0.1 has been released, so all the blockers for eant2 are fixed. I'll get started on the update now and get the library functioning as soon as possible. As for updating const_cge to use the latest cge version, I figured I'd note the major changes for you:

In the new version of cge, the recurrent state is deduplicated by neuron ID. It seems that const_cge currently duplicates any neuron IDs referred to by multiple recurrent jumper genes, so it will have to be changed for the behaviors of the two libraries to match. For example, in the following gene sequence: recurrent(0), recurrent(1), recurrent(1), the recurrent state should only be [NeuronId(0), NeuronId(1)], because the third connection uses a duplicate ID. cge uses a HashSet to ensure uniqueness while pushing to a Vec to maintain order; see update_recurrent_state_ids here.
Network evaluation is roughly the same as before, but the Gene and Network APIs have been redesigned as you've seen. All of the necessary APIs should be public though, so you can probably just copy the evaluate_slice function again.
There is a test that produces random networks that get written to test_output/. You can copy the code itself to test const_cge, or you can just save a few of the generated ones and test those.

wbrickner commented 2 years ago

OK, I will try to look through the new execution model & replicate it in const_cge, I want to be able to support all the same serializations & hopefully emit more compact codegen.

By the way: incredible job on the remodeling of cge, very impressive!

pengowen123 commented 2 years ago

Thanks! If you directly follow the new evaluate_slice, each gene will be evaluated twice in the worst case. In theory this is unnecessary, but to prevent it you would need to construct a graph for each network. If you want to do this, cge now provides the parent neuron ID of each gene, which should make the task simpler.

wbrickner commented 2 years ago

May have identified a bug in your evaluate implementation, not sure.

I'm using the `with_extra_data_v1.cge` file from your repo.

```json { "version": "1", "network": { "metadata": { "description": "A simple network with extra user-defined data." }, "activation": "tanh", "genome": [ { "kind": "neuron", "id": 2, "num_inputs": 3, "weight": 0.5 }, { "kind": "recurrentjumper", "source_id": 2, "weight": 0.8 }, { "kind": "input", "id": 0, "weight": 0.1 }, { "kind": "input", "id": 1, "weight": 0.3 } ], "recurrent_state": [ 0.6043677771171636 ], "extra": { "x": 1, "y": [ 2.0, 3.0 ] } } } ```

With such a small network, the codegen is (I think) pretty clearly correct, but its behavior does not match the new CGE library, not even close:

pub fn evaluate(&mut self, inputs: &[f64; 2usize], outputs: &mut [f64; 1usize]) {
  let c0 = 0.3f64 * inputs[1usize];
  let c1 = 0.1f64 * inputs[0usize];
  let c2 = 0.8f64 * self.persistence[0usize];
  let c3 = c2 + c1 + c0;
  let c3 = const_cge::activations::f64::tanh(c3);
  let c4 = c3 * 0.5f64;
  outputs[0usize] = c4;
  self.persistence[0usize] = c3;
}

Is my codegen wrong or is your eval function wrong, or perhaps both? :P

pengowen123 commented 2 years ago

It seems that you are not loading the recurrent state from the file, while the example in cge does. When I test your code locally, it matches the cge output when not loading the recurrent state. This behavior can be configured by an option in cge, so it would probably be useful to replicate it in const_cge.

wbrickner commented 2 years ago

hm, I am loading with a second argument WithRecurrentState(false)

Will verify this is actually what's used everywhere

wbrickner commented 2 years ago

So doing this fails completely to match my codegen:

let (runtime, _, _) = Network::load_file::<(), _>("./test_inputs/test_network_v1.cge", WithRecurrentState(false))
      .expect("Failed to dynamically load CGE file");

but this matches to about 1 part in 10⁹:

let (mut runtime, _, _) = Network::load_file::<(), _>("./test_inputs/test_network_v1.cge", WithRecurrentState(false))
      .expect("Failed to dynamically load CGE file");
runtime.clear_state();

Does the WithRecurrentState(false) not do what I think it does?

pengowen123 commented 2 years ago

That's odd. This code runs for me and prints the same values for both implementations:

use cge::{Network, Activation};

let (mut network, _, _) = Network::<f64>::load_file::<(), _>(
    "test_data/with_extra_data_v1.cge",
    WithRecurrentState(false),
)
.unwrap();
println!("{:?}", network.evaluate(&[1.0, 2.0]).unwrap());

struct Foo {
    persistence: [f64; 1],
}

impl Foo {
    pub fn evaluate(&mut self, inputs: &[f64; 2usize], outputs: &mut [f64; 1usize]) {
        let c0 = 0.3f64 * inputs[1usize];
        let c1 = 0.1f64 * inputs[0usize];
        let c2 = 0.8f64 * self.persistence[0usize];
        let c3 = c2 + c1 + c0;
        let c3 = Activation::tanh(c3);
        let c4 = c3 * 0.5f64;
        outputs[0usize] = c4;
        self.persistence[0usize] = c3;
    }
}

let mut foo = Foo {
    persistence: [0.0],
};

let mut output = [0.0; 1];
foo.evaluate(&[1.0, 2.0], &mut output);
println!("{:?}", output);

Initializing foo with persistence: [0.6043677771171636] (from the file) and using WithRecurrentState(true) also results in equivalent outputs. When running the latest two code blocks you posted with just cge, do the outputs differ?

wbrickner commented 2 years ago

I am property testing, perhaps the issues are not clear from testing a small number of inputs.

/// Static and dynamic constructions of the network should
/// have identical output for all inputs.
/// 
/// - Network memory is wiped after every test.
#[test]
fn recurrent_memorywipe_50k_trials() {
  // statically create network
  #[network("./test_inputs/test_network_v1.cge", numeric_type = f64)]
  struct TestNet;

  // dynamically load the exact same network
  let (mut runtime, _, _) = Network::load_file::<(), _>("./test_inputs/test_network_v1.cge", WithRecurrentState(false))
    .expect("Failed to dynamically load CGE file");
  runtime.clear_state();

  // gimme 50,000 `[f64; 2]`; where each f64 falls in [-1, +1]
  proptest!(ProptestConfig::with_cases(50_000), |(input_vector in uniform2(-1.0f64..1.0f64))| {
    let mut net = TestNet::default();
    let mut runtime = runtime.clone();

    let static_outputs = {
      let mut outputs = [0.0; 1];
      net.evaluate(&input_vector, &mut outputs);
      outputs.to_vec()
    };

    let runtime_outputs = runtime.evaluate(&input_vector[..]).unwrap();

    assert_eq!(static_outputs, runtime_outputs);
  });
}

View expansion of `TestNet` impl

```rust pub fn evaluate(&mut self, inputs: &[f64; 2usize], outputs: &mut [f64; 1usize]) { let c0 = 3f64; let c1 = 0.2f64 * self.persistence[0usize]; let c2 = 0.8f64 * inputs[1usize]; let c3 = 0.7f64 * inputs[0usize]; let c4 = 0.4f64 * inputs[1usize]; let c5 = 0.1f64 * inputs[0usize]; let c6 = c5 + c4; let c6 = const_cge::activations::f64::linear(c6); let c7 = c6 * 0.3f64; let c8 = c7 + c3 + c2 + c1 + c0; let c8 = const_cge::activations::f64::linear(c8); let c9 = c8 * 0.2f64; let c10 = 0.5f64 * inputs[1usize]; let c11 = 0.4f64 * inputs[1usize]; let c12 = 0.1f64 * inputs[0usize]; let c13 = c12 + c11; let c13 = const_cge::activations::f64::linear(c13); let c14 = c13 * 0.9f64; let c15 = c14 + c10; let c15 = const_cge::activations::f64::linear(c15); let c16 = c15 * 0.8f64; let c17 = c16 + c9; let c17 = const_cge::activations::f64::linear(c17); let c18 = c17 * 0.6f64; outputs[0usize] = c18; self.persistence[0usize] = c17; } ```

thread 'tests::test_network_v1::recurrent_memorywipe_50k_trials' panicked at 'Test failed: assertion failed: `(left == right)`
  left: `[-0.08592725785700363]`,
 right: `[-0.08592725785700366]`; minimal failing input: input_vector = [0.19156495759134007, -0.9001986894303342]

I think the order of operations we are performing may differ, so our rounding errors differ, it is a tiny tiny difference in output.

wbrickner commented 2 years ago

IDK what's wrong here, I've just given up on making them match bit for bit.

You mentioned earlier that there was infrastructure inside cge for generating many random networks with valid structure?

pengowen123 commented 2 years ago

I'll take a look at your code and see if I can get them to match at least.

If you clone the repo and run cargo test, it will generate several random networks in test_output/. The code itself is here: https://github.com/pengowen123/cge/blob/08ae750f0d26c7a04bc4585538f9dea7c074749c/src/network/mod.rs#L2173-L2303

Much of it is solely for testing cge, so it could be adapted for your use case. Also, it references a few functions defined at the top of the tests module that you'd need to copy as well.

wbrickner commented 2 years ago

yeah I need them to be valid mutations so all the invariants hold & the CGE doesn't cause a panic

wbrickner commented 2 years ago

Is there a simple way to modify the network generator code to not perform invalid mutations?

pengowen123 commented 2 years ago

The code I linked won't cause any panics because any errors are discarded. It simply tries random mutations and ignores invalid ones, but usually about half of them are valid so it's not too inefficient. If you want only valid mutations I can put something together.

wbrickner commented 2 years ago

Pushed newest version to main branch

wbrickner commented 2 years ago

The code I linked won't cause any panics because any errors are discarded. It simply tries random mutations and ignores invalid ones, but usually about half of them are valid so it's not too inefficient. If you want only valid mutations I can put something together.

Interesting, I tried using the generated networks test_outputs/random_*.cge and all of them panicked inside cge call (when testing from the const_cge side!)

pengowen123 / eant2

Is this crate still broken? #3