Open wbrickner opened 2 years ago
Thanks for your interest in using the library!
Unfortunately, the crate is still broken, although you are right that it is possible to fix now that the cmaes
crate is fixed. I don't feel that the code of this library is suitable for release even with those fixes, however, so there would likely be more work to be done before it is ready for release. I should have some time to look over it soon, but I can't promise any timeframe for its release at the moment.
Ok! I had a look at the codebase and tried porting it to the new API of cmaes
last night, but I didn't have a very good understanding of cmaes
or what the original code intended to do.
Not sure I can get it release-quality all by myself but I'd like to try to push it in that direction. If I fix things up a bit would you accept a PR to this end, assuming the API is well designed and there aren't any crimes being committed in the rewrite?
(Also, of course, crates.io is home to many thousands of proudly hacky crates that aren't exactly formally verified!)
In terms of cmaes
changes, the main things would be:
optimize_network
function should build a CMAESOptions
with any relevant settings and run it. cmaes_runs
option should be replaced with a restart strategy provided by the cmaes
crate or removed entirely (I don't remember what the original paper suggested here).IIRC there were still some bugs with the mutation system that couldn't reasonably be tested/fixed with cmaes
not working, but now that that's fixed it can be revisited.
But to answer your question, I'd gladly accept a PR that got everything working! Changes can always be made to improve the API and such later.
Ok, I will abuse this issue as a place to ask you questions in public while repairing, as I am not intimately familiar with the EANT2 algorithm as you are.
What would you say serves as a reasonable default for the restart strategy? Local
, IPOP
, or BIPOP
?
BIPOP
should perform the best, although if it ends up running too slowly it can be swapped out for Local
(configured with the existing cmaes_runs
option).
The API I've designed allows external separation of the termination conditions of the EANT2 and CMA-ES algorithms.
What I'm wondering is how termination should default for the CMA-ES algorithm. The current CMA-ES API (I think) does not force the caller to declare either a maximum generation or function target. I think this means the CMA-ES will run possibly forever in the inner loop.
What do you think should be the default behavior, or how is this /supposed/ to work from the EANT2 side?
The returned CMA-ES termination conditions can be ignored here (except for InvalidFunctionValue
as it's more of an error). The max generations option of cmaes
should be exposed in eant2
, although it can be None
by default because CMA-ES will generally terminate anyways. fun_target
should also be exposed as it's a reasonable termination condition for EANT2 as well.
I have a more basic question.
I am refactoring the optimize_network
function. I need to understand the intended procedure for evaluating the fitness of an individual.
cmaes
provides a &DVector<f64>
and expects f64
fitness back.
I can't find the original internals of cmaes_loop
, so I'm not sure what's supposed to happen here. I think it must involve some kind of complicated traversal of the cge
network genome buffer using the parameter buffer and some interaction with the "reinforcement learning environment". Certainly something I don't want to rebuild, is that implementation available somewhere still?
Actually, duh, I only need the part where the parameters are integrated with the structure encoded by the cge
buffer.
All loss function stuff is handled by the fitness trait.
would something like this have the intended effect?
.build(|parameters: &DVector<f64>| {
// copy the parameters over into the network (not ideal), modifying the individual
individual
.network
.genome
.iter_mut()
.zip(parameters.iter())
.for_each(|(gene, p)| gene.weight = *p);
individual.fitness = individual.object.get_fitness(&mut individual.network);
individual.fitness
});
Also, design choice:
Do you think it makes more sense to provide parallelism on the eant2
side or the cmaes
side?
I naively introduced parallelism at the eant2
side, but there already exists parallelism on cmaes
side, and enabling both may actually hurt. Thoughts?
Edit: to clarify, either many threads on the eant2
side each with their own invocation of optimize_network
(and a single-threaded CMA-ES invocation), or a single-threaded eant2
side which serially examines every individual and puts all the parallelism inside the CMA-ES
invocation, or perhaps do both. I think both could maybe be faster, I have no idea and no measurements, but modern CPUs have neat tricks like hyperthreading / SMT which could maybe allow scheduling "too much work" to be an optimization. Wondering if you have a preference or insight.
cmaes::ObjectiveFunction
should be implemented for Individual
, so you can just call build
with &individual
or &mut individual
(ObjectiveFunction
will have to be implemented for references to Individual
, but you can just forward to the base implementation).
As for parallelism, it will likely be fastest to implement it at the eant2
level so that the entirety of CMA-ES is run in parallel rather than just the objective function. This would be worth benchmarking later though to see whether additionally enabling it in cmaes
would be worth it.
I will write the trait implementation for eant2::Individual
, as it has no such implementation now. Cleaner solution, good point. Edit: this is not possible. ObjectiveFunction
provides no mechanism for accessing the a network genome. Closure it is.
It remains unclear to me, just to clarify, you favor an implementation in which many eant2
optimize_network
functions run in parallel?
The implementation is here (it's roughly what your closure does):
https://github.com/pengowen123/eant2/blob/master/src/utils.rs#L49
The new ObjectiveFunction
takes &mut self
as a parameter, so you should be able to set the weights on self.network.genome
instead of cloning as well.
Yes, multiple optimize_network
calls should be run in parallel.
After thinking about it a little more, it is likely that higher performance can be unlocked by driving all system parallelism from the eant2
side. The eant2
algorithm's usage of cma-es
is trivially parallel, with zero interaction between the tasks.
This may or may not be true of the cma-es
internals, and I expect they are much more complex. I also favor cma-es
running single-threaded & eant2
bringing all of the parallelism.
Ok, almost finished.
I had some questions about gene_deviations
computed as 1 / (1 + age^2)
. These used to be provided to cma-es
, but now there seems to be no way to provide them. Should they no longer be computed in EANT2?
Also, I'm seeing some cool optimization opportunities if a "network fingerprint" can be constructed, like a hash, possibly permissible for it to be variable length.
The most significant might be in the de-duplication phase, which is O(n^2)
presently. I recall there may be some techniques for a compressed representation of network structure, but what that is I can't say. Any thoughts?
I also have some changes I want to make to network evaluation that doesn't require cloning inside the hot loop, but it requires some light changes to your other library cge
/ cge-core
. Specifically I'd like to be able to evaluate an existing network with a slice of candidate parameters without having to copy those params in.
The initial distribution cannot be adjusted in cmaes
anymore, but you can scale the overall search space by the same values instead (see Scale). The returned solution must be scaled inversely to reflect this.
CGE already is a compressed format, but some sort of hash would probably still speed deduplication up. On the topic of optimization though, I would hold off on those kinds of changes for now. I would expect the vast majority of time to be spent in cmaes
and, for real-world use cases, the fitness function, so it would be best to implement some benchmarks to guide optimization after the core functionality is complete.
However, you are welcome to make changes to cge
as well if you find it necessary. And I want to thank you for putting the time into doing this. I really appreciate it!
The returned solution must be scaled inversely to reflect this.
Can you clarify what you mean? Would it not be the case that the returned solution must be scaled /again/, because the return value did not pass through a Scale
?
Also, is this a good default? To scale by a metric that is deterministic in gene age for each iteration? It does not appear obvious that this would work well for many problems, but I don't have expertise here.
You are correct about the scaling, my mistake. The Scale
type provides a method for this.
This behavior is taken from the original EANT2 paper, which stated that they observed the weights of older genes to not change much despite structural changes to the network elsewhere. Scaling the search distribution therefore avoids wasting resources by causing CMA-ES to only search a smaller range in those axes. However, it does have mechanisms to naturally increase the scales of the axes if they turn out to be more useful than assumed.
So I'm writing the public API for controlling RestartStrategy
, and I'm a little confused.
Why must RestartStrategy know about the dimensionality of the problem / search range? Is there some way we can have this be automatically detected.
I really really like the idea of making machine learning /easy/, and the explosion of required (seemingly opaque) parameters that must be declared up front I think works in the opposite direction. I think almost everything should have a default value that "just works", if that's possible. Is configuring the RestartStrategy
one such case? If so, can you help me see how?
Thank you!
Edit: I've got the public API down to:
let train =
EANT2::builder()
.inputs(2)
.outputs(1)
.activation(Activation::Threshold)
.restarter(...)
.build();
let (network, fitness) = train.run(&Foo);
The last thing to make elegant AFAIK is the restarter.
RestartOptions
is supposed to be a complete interface to cmaes
, so at the minimum it does require the problem dimension. The problem dimension in EANT2's use case is the number of connection weights in the network, which varies per network and cannot be specified up front by a user of eant2
. Therefore, the RestartStrategy
API will have to be duplicated to some extent to avoid specifying the dimension. I would recommend making a new RestartStrategy
enum and exposing only the parameters that are directly relevant to EANT2:
enum CMAESRestartStrategy {
BIPOP,
IPOP,
Local(usize),
}
BIPOP
and IPOP
should probably just use the default settings, so nothing needs to be exposed on the eant2
side. The max_runs
option of Local
might be useful to users, so it can be exposed here.
Ok, so using the CMAESOptions
directly, there is a way to specify the "initial mean" (I assume this is the centroid of the high dimensional gaussian along each axis in parameter space) so that optimization continues on from the previously discovered acceptable point in the parameter space. (This seems like a really important fact about the old implementation!).
Using Restarter
, I don't know how to do that. Here's what I've got:
let best = {
// TODO: amortize allocation
let initial_mean: Vec<f64> = individual.network.genome.iter().map(|g| g.weight).collect();
let initial_step_size = 0.3;
// TODO: avoid these clones if possible.
let scaled = Scale::new(individual.clone(), gene_deviations.clone());
// run the CMA-ES optimization pass
let restarter =
Restarter::new(
RestartOptions::new(
individual.network.genome.len() + 1, // number of parameters to optimize
-1.0..=1.0, // ??? what interval is this? what does this even mean? this is 1D, so it's not to do with parameter space.
options.restart.clone()
)
.mode(Mode::Minimize)
.enable_printing(options.print)
.fun_target(options.terminate.exploitation.fitness)
.max_generations_per_run(options.terminate.exploitation.generations)
)
.unwrap();
restarter.run_with_reuse(scaled).best.unwrap()
};
Am I doing this completely wrong? I don't really understand the structure of your cmaes
crate yet.
The Restarter
will randomly generate the initial search point for each run within the search range specified (-1.0..=1.0
in your current code). The same search range is applied for all dimensions because individual axes can be scaled with Scale
, which would effectively scale the initial search range as well. This does seem confusing now that I look at it, so I might have to redesign this API for the next release.
This behavior seems to conflict with EANT2 wanting to keep the same initial search point. I will check the paper as I don't remember all the details here, but perhaps this requirement can be relaxed or the Restarter
bypassed entirely in favor of directly initialized CMA-ES runs. Alternatively, Individual
's implementation of ObjectiveFunction
can just translate its input by the desired initial mean:
fn evaluate(x: &DVector<f64>) -> f64 {
let initial_mean = self.network.genome.iter().map(|g| g.weight).collect();
let x = x + initial_mean;
...
}
This effectively does what you want as the search space will be translated to be centered on initial_mean
, and so Restarter
will only be adding a random offset to this value. I might consider adding this functionality into cmaes
itself because it seems useful.
I also noticed two small things in the code you posted. First, you can pass &mut individual
to Scale
to avoid cloning it. If it says ObjectiveFunction
isn't implemented, you can just add a simple implementation for &mut Individual
that calls the existing implementation. Second, the number of parameters is being set to genome.len() + 1
. I see this is in the original code too, but I think it might be a bug. The extra parameters doesn't seem to be used anywhere, so it should be safe to just use genome.len()
.
EDIT: The initial step size is calculated based on the size of the search range, so it can't be set manually.
What do you think timeline looks like for investigating / fixing the potential mutation bug + potential cge
evaluation bug?
I can try looking into these matters on my own and submitting another PR, but I don't want to step on your toes and tbh you seem to have a much better understanding of the cge
mechanics.
I'll get the cge
bug fixed today and do the refactoring in a separate PR. Once both of those are done I can look into the mutation issue in eant2
, which I don't expect to take too long to fix. Overall, I would estimate 3-4 days but possibly sooner.
With the new changes in cge
, should eant2
be working? I'm not sure where it stands, very excited to start using it for real once we have all the tooling in place!
eant2
will just need to be updated to use the latest cge
version, and then it should be fully functional. cge
is just about done; I'm just finishing some minor features and then the documentation.
Got bored and was going to try to upgrade eant2
, but I ran into an issue, so heads up you'll need to mend the cge
Gene
API to allow mutation of weights
The only work left to do for cge
is updating the documentation and such, so if you succeed in upgrading eant2
now that would be great!
It is necessary to preserve genome validity to make obtaining a &mut Gene
impossible, so all the methods are provided by Network
directly. You should be able to use Network::set_weights
for eant2
's use case though.
I keep trying to merge & then am confronted with the fact that I can tell I'm not seeing the bigger picture on how these are intended to fit together. Feels like a lot of bookkeeping stuff in eant2
just doesn't need to exist anymore, but I don't have a solid understanding of how it's supposed to look at the end. I think you will be better at this having written cge
!
No worries! I won't be home for the next few days, but I'll be back to release cge
and get started on the eant2
update on Sunday.
@wbrickner Alright, cge
0.1 has been released, so all the blockers for eant2
are fixed. I'll get started on the update now and get the library functioning as soon as possible. As for updating const_cge
to use the latest cge
version, I figured I'd note the major changes for you:
cge
, the recurrent state is deduplicated by neuron ID. It seems that const_cge
currently duplicates any neuron IDs referred to by multiple recurrent jumper genes, so it will have to be changed for the behaviors of the two libraries to match. For example, in the following gene sequence: recurrent(0), recurrent(1), recurrent(1)
, the recurrent state should only be [NeuronId(0), NeuronId(1)]
, because the third connection uses a duplicate ID. cge
uses a HashSet
to ensure uniqueness while pushing to a Vec
to maintain order; see update_recurrent_state_ids
here. Gene
and Network
APIs have been redesigned as you've seen. All of the necessary APIs should be public though, so you can probably just copy the evaluate_slice
function again.test_output/
. You can copy the code itself to test const_cge
, or you can just save a few of the generated ones and test those.OK, I will try to look through the new execution model & replicate it in const_cge
, I want to be able to support all the same serializations & hopefully emit more compact codegen.
By the way: incredible job on the remodeling of cge
, very impressive!
Thanks! If you directly follow the new evaluate_slice
, each gene will be evaluated twice in the worst case. In theory this is unnecessary, but to prevent it you would need to construct a graph for each network. If you want to do this, cge
now provides the parent neuron ID of each gene, which should make the task simpler.
May have identified a bug in your evaluate implementation, not sure.
With such a small network, the codegen is (I think) pretty clearly correct, but its behavior does not match the new CGE library, not even close:
pub fn evaluate(&mut self, inputs: &[f64; 2usize], outputs: &mut [f64; 1usize]) {
let c0 = 0.3f64 * inputs[1usize];
let c1 = 0.1f64 * inputs[0usize];
let c2 = 0.8f64 * self.persistence[0usize];
let c3 = c2 + c1 + c0;
let c3 = const_cge::activations::f64::tanh(c3);
let c4 = c3 * 0.5f64;
outputs[0usize] = c4;
self.persistence[0usize] = c3;
}
Is my codegen wrong or is your eval function wrong, or perhaps both? :P
It seems that you are not loading the recurrent state from the file, while the example in cge
does. When I test your code locally, it matches the cge
output when not loading the recurrent state. This behavior can be configured by an option in cge
, so it would probably be useful to replicate it in const_cge
.
hm, I am loading with a second argument WithRecurrentState(false)
Will verify this is actually what's used everywhere
So doing this fails completely to match my codegen:
let (runtime, _, _) = Network::load_file::<(), _>("./test_inputs/test_network_v1.cge", WithRecurrentState(false))
.expect("Failed to dynamically load CGE file");
but this matches to about 1 part in 109:
let (mut runtime, _, _) = Network::load_file::<(), _>("./test_inputs/test_network_v1.cge", WithRecurrentState(false))
.expect("Failed to dynamically load CGE file");
runtime.clear_state();
Does the WithRecurrentState(false)
not do what I think it does?
That's odd. This code runs for me and prints the same values for both implementations:
use cge::{Network, Activation};
let (mut network, _, _) = Network::<f64>::load_file::<(), _>(
"test_data/with_extra_data_v1.cge",
WithRecurrentState(false),
)
.unwrap();
println!("{:?}", network.evaluate(&[1.0, 2.0]).unwrap());
struct Foo {
persistence: [f64; 1],
}
impl Foo {
pub fn evaluate(&mut self, inputs: &[f64; 2usize], outputs: &mut [f64; 1usize]) {
let c0 = 0.3f64 * inputs[1usize];
let c1 = 0.1f64 * inputs[0usize];
let c2 = 0.8f64 * self.persistence[0usize];
let c3 = c2 + c1 + c0;
let c3 = Activation::tanh(c3);
let c4 = c3 * 0.5f64;
outputs[0usize] = c4;
self.persistence[0usize] = c3;
}
}
let mut foo = Foo {
persistence: [0.0],
};
let mut output = [0.0; 1];
foo.evaluate(&[1.0, 2.0], &mut output);
println!("{:?}", output);
Initializing foo
with persistence: [0.6043677771171636]
(from the file) and using WithRecurrentState(true)
also results in equivalent outputs. When running the latest two code blocks you posted with just cge
, do the outputs differ?
I am property testing, perhaps the issues are not clear from testing a small number of inputs.
/// Static and dynamic constructions of the network should
/// have identical output for all inputs.
///
/// - Network memory is wiped after every test.
#[test]
fn recurrent_memorywipe_50k_trials() {
// statically create network
#[network("./test_inputs/test_network_v1.cge", numeric_type = f64)]
struct TestNet;
// dynamically load the exact same network
let (mut runtime, _, _) = Network::load_file::<(), _>("./test_inputs/test_network_v1.cge", WithRecurrentState(false))
.expect("Failed to dynamically load CGE file");
runtime.clear_state();
// gimme 50,000 `[f64; 2]`; where each f64 falls in [-1, +1]
proptest!(ProptestConfig::with_cases(50_000), |(input_vector in uniform2(-1.0f64..1.0f64))| {
let mut net = TestNet::default();
let mut runtime = runtime.clone();
let static_outputs = {
let mut outputs = [0.0; 1];
net.evaluate(&input_vector, &mut outputs);
outputs.to_vec()
};
let runtime_outputs = runtime.evaluate(&input_vector[..]).unwrap();
assert_eq!(static_outputs, runtime_outputs);
});
}
thread 'tests::test_network_v1::recurrent_memorywipe_50k_trials' panicked at 'Test failed: assertion failed: `(left == right)`
left: `[-0.08592725785700363]`,
right: `[-0.08592725785700366]`; minimal failing input: input_vector = [0.19156495759134007, -0.9001986894303342]
I think the order of operations we are performing may differ, so our rounding errors differ, it is a tiny tiny difference in output.
IDK what's wrong here, I've just given up on making them match bit for bit.
You mentioned earlier that there was infrastructure inside cge
for generating many random networks with valid structure?
I'll take a look at your code and see if I can get them to match at least.
If you clone the repo and run cargo test
, it will generate several random networks in test_output/
. The code itself is here:
https://github.com/pengowen123/cge/blob/08ae750f0d26c7a04bc4585538f9dea7c074749c/src/network/mod.rs#L2173-L2303
Much of it is solely for testing cge
, so it could be adapted for your use case. Also, it references a few functions defined at the top of the tests
module that you'd need to copy as well.
yeah I need them to be valid mutations so all the invariants hold & the CGE doesn't cause a panic
Is there a simple way to modify the network generator code to not perform invalid mutations?
The code I linked won't cause any panics because any errors are discarded. It simply tries random mutations and ignores invalid ones, but usually about half of them are valid so it's not too inefficient. If you want only valid mutations I can put something together.
Pushed newest version to main branch
The code I linked won't cause any panics because any errors are discarded. It simply tries random mutations and ignores invalid ones, but usually about half of them are valid so it's not too inefficient. If you want only valid mutations I can put something together.
Interesting, I tried using the generated networks test_outputs/random_*.cge
and all of them panicked inside cge
call (when testing from the const_cge
side!)
Hello!
I'm very excited about using this library, however the README claims it is in a broken state, waiting for fixes in the CMA-ES repo. I see the CMA-ES repo is more active, with many commits recently.
Is the README still accurate in saying this crate is broken?
Amazing work, thank you so much!