wildart / Evolutionary.jl

Evolutionary & genetic algorithms for Julia
Other
321 stars 61 forks source link

Parallelization #45

Open wildart opened 4 years ago

wildart commented 4 years ago

Consider parallelization of the algorithms in multiple modes:

tpdsantos commented 4 years ago

Despite the major changes I tried to pull in #43 , parallelization of the entire population is not that hard to implement, using the DistributedArrays package. I already have a prototype that works well with several processes in the same computer and I almost have a way to easily incorporate this in a cluster. If I have time I can add this prototype in another PR

wildart commented 4 years ago

I do not think that DistributedArrays is an answer. It covers only multi-core scenario. Even basic julia parallel computing routines are enough for scatter-gather computations that are required for evolutionary algorithms.

My goal is to make some sort of universal parallelization pipeline that would be configured to a specific computational topology, and the used to run any evolutionary algorithm. In order to do that, all parallelizable parts of the evolutionary algorithms need to be self-contained side-effect free functions, similar to #26 or ga part of #43.

The computational pipeline should have some simple interface and comprehensive syntax,

input |> ga(fitness = objFunc, mutation = inversion) |> Distributed(ncores = 10) 

or maybe even some simple DSL,

@local ga(input, mutationRate = 0.2, tolIter = 20) do
    population |> roulette |> inversion |> offspring
end
tpdsantos commented 4 years ago

I do not think that DistributedArrays is an answer. It covers only multi-core scenario. Even basic julia parallel computing routines are enough for scatter-gather computations that are required for evolutionary algorithms.

I understand what you're saying, but the Distributed package also deals with multi-core only. You could use something like Base.Threads, but that wouldn't be easy at all, since the ga function would need major changes.

wildart commented 4 years ago

but that wouldn't be easy at all, since the ga function would need major changes.

Which you already doing in #43 :wink:

Anyway, I think the right approach would be to start compartmentalizing code of evolutionary functions.

wildart commented 4 years ago

The #49 should provide a easier way of implementing parallelized versions of existing algorithms by introducing a series of new states with appropriate parallel update_state! implementations.

jtravs commented 3 years ago

An easy approach to parallelisation which would be useful, is to ask the user (caller) to calculate the fitness for many individuals at once. Then the user can simply parallelise that call themselves (using whatever means is appropriate for their machine) and minimal changes are required in this package. i.e. what I am suggesting is simply that the user provides a function which must internally iterate over teh population. In a simple case I could already use this directly and parallelise through e.g. a call to pmap.

wildart commented 3 years ago

calculate the fitness for many individuals at once.

That might work. Currently, the fitness evaluation done by value call with the objective and the individual parameters. If a broadcast (in-place) version of it can be introduced to perform bulk evaluation, it can be overloaded for specific individual type to introduce concurrent broadcast version.

gasagna commented 3 years ago

Hi, nice package!

Have there been any update on the parallelisation?

wildart commented 2 years ago

I added a simple override for multi-threaded fitness evaluation: https://wildart.github.io/Evolutionary.jl/dev/tutorial/#Parallelization. Look up the dev part of documentation for information on creating additional overrides for parallel fitness evaluation: https://wildart.github.io/Evolutionary.jl/dev/dev/#Parallelization.

nguyentmanh commented 2 years ago

Hi!

I'm trying to implement parallelization following the first link, but when I download the package and check Evolutionary.Options(), parallelization is not listed. Actually, "rng" and "callback" are also missing. Do you know what might be the reason? Thanks!

mfogelson commented 1 year ago

Just want to follow up and saw that allowing value functions to be able to use Distributed would be great addition.

I think the threading helps but doesn't allow you to evaluate multiple samples from the population simultaneously