lspector / Clojush

The Push programming language and the PushGP genetic programming system implemented in Clojure.
http://hampshire.edu/lspector/push.html
Eclipse Public License 1.0
331 stars 92 forks source link

IndexOutOfBoundsException when producing offspring in epsilon-lexicase-selection #270

Closed transducer closed 5 years ago

transducer commented 5 years ago

When running a modified UBall5D example on a training set with around 4000 test cases of the form:

[[0.0 58.992325]
 [1.0 60.346346]
 [2.0 71.436636]
 [3.0 80.2523]
 ...
 [4000.0 46.23525]]

with the following argmap:

(defn argmap [training-data] ;; partitioned training data
  {:error-function (fn [individual]
                     (swap! individuals-count inc) ;; needed to manually figure out which generation we are in for sliding window
                     (assoc individual
                            :errors
                            (doall
                             (for [[x y] (training-data (get-generation!))] ;; get index of sliding window
                               (let [result (->> (interpreter/run-push (:program individual)
                                                                       (->> (pushstate/make-push-state)
                                                                            (pushstate/push-item x :input)))
                                                 (pushstate/top-item :float))]
                                 (if (and (number? result)
                                          (some #(= % 'in1) (:program individual)))
                                   (Math/abs (- result y))
                                   1000000.0))))
                            :labels
                            (mapv second (training-data (get-generation!)))))
   :atom-generators (conj '[float_div float_mult float_sub float_add
                            float_sin float_cos
                            float_rot float_swap float_dup float_pop
                            in1]
                          random/lrand)
   :population-size 200
   :error-threshold 1.0
   :epigenetic-markers []
   :parent-selection :epsilon-lexicase
   :problem-specific-report report
   :print-errors false
   :genetic-operator-probabilities {:alternation 0.5
                                    :uniform-mutation 0.5}})

After a few generations (seen from first to around twentieth) the following stack trace appears:

IndexOutOfBoundsException
    clojure.lang.RT.nthFrom (RT.java:921)
    clojure.lang.RT.nth (RT.java:890)
    clojush.pushgp.selection.epsilon-lexicase/epsilon-lexicase-selection (epsilon_lexicase.clj:42)
    clojush.pushgp.selection.epsilon-lexicase/epsilon-lexicase-selection (epsilon_lexicase.clj:27)
    clojush.pushgp.selection.selection/select (selection.clj:15)
    clojush.pushgp.selection.selection/select (selection.clj:7)
    clojush.pushgp.breed/perform-genetic-operator (breed.clj:137)
    clojush.pushgp.breed/perform-genetic-operator (breed.clj:130)
    clojush.pushgp.breed/breed (breed.clj:166)
    clojush.pushgp.breed/breed (breed.clj:155)
    clojure.core/apply (core.clj:663)
    clojure.core/binding-conveyor-fn/fn--5476 (core.clj:2033)
    clojure.lang.Agent$Action.doRun (Agent.java:114)
    clojure.lang.Agent$Action.run (Agent.java:163)
    java.util.concurrent.ThreadPoolExecutor.runWorker (ThreadPoolExecutor.java:1149)
    java.util.concurrent.ThreadPoolExecutor$Worker.run (ThreadPoolExecutor.java:624)
    java.lang.Thread.run (Thread.java:748)

The output of the program before the exception is:

Best program: (float_sub float_cos float_swap float_swap in1 float_dup in1 0.03712591352451444 float_rot float_cos 0.9204838568392405 float_div float_add float_sin float_add float_sub float_sub 0.7072734463375348 float_sin 0.8334911205159632 0.9133344727495266 float_cos float_sin 0.8334911205159632 -0.06357154664358589 float_cos float_sin in1 float_sin float_div in1 float_div float_div float_cos float_dup float_sub float_swap float_sub float_cos float_sub float_div float_add)
Partial simplification: (float_sub float_cos in1 float_dup in1 0.03712591352451444 float_rot float_cos 0.9204838568392405 float_div float_add float_sin float_add float_sub 0.7072734463375348 float_sin 0.8334911205159632 0.9133344727495266 float_cos float_sin 0.8334911205159632 -0.06357154664358589 float_cos float_sin in1 float_div in1 float_div float_div float_cos float_dup float_sub float_swap float_sub float_cos float_sub float_div float_add)
Total: 69494.12281710403
Mean: 17.231373
Genome size: 42
Size: 43
Percent parens: 0.023
--- Population Statistics ---
Average total errors in population: 6.0854603326606825E7
Median total errors in population: 100711.97922800652
Average genome size in population (length): 43.835
Average program size in population (points): 44.835
Average percent parens in population: 0.023
Minimum age in population: 0.0
Maximum age in population: 17.0
Average age in population: 15.725
Median age in population: 17.0
Minimum grain-size in population: 1.0
Maximum grain-size in population: 1.0
Average grain-size in population: 1.0
Median grain-size in population: 1.0
--- Population Diversity Statistics ---
Min copy number of one genome: 1
Median copy number of one genome: 1
Max copy number of one genome: 11
Genome diversity (% unique genomes):     0.555
Min copy number of one Push program: 1
Median copy number of one Push program: 1
Max copy number of one Push program: 11
Syntactic diversity (% unique Push programs):    0.555
Total error diversity:               0.55
Error (vector) diversity:            0.55
--- Run Statistics ---
Number of program evaluations used so far: 3600
Number of point (instruction) evaluations so far: 587208398
--- Timings ---
Current time: 1550078227542 milliseconds
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; -*- End of report for generation 17
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

Producing offspring...

I think there's a bug somewhere in epsilon-lexicase-selection.

lspector commented 5 years ago

I can't say for sure based on what I've seen, but I would rather suspect that there are individuals in the population with different numbers of errors, perhaps owing to something in the implementation of (get-generation!). Have you checked for this?

transducer commented 5 years ago

Thanks for your reply.

The error also happens without the call to get-generation!

(def individuals-count (atom 0))
(def population-size 300) ;; roughly
(defn get-generation! [] (int (/ @individuals-count population-size)))

I would rather suspect that there are individuals in the population with different numbers of errors

What is the case is that the windows have different sizes. The data is partitioned by fourteen days, but some windows of two weeks have less measurements than others. Could that lead to the error? I.e., one generation used a window with 4000 measurements and the previous one used 3999. And that an individual from a previous generation used fewer test cases than the other one?

I will give all the windows the size of the shortest one to check.

Thanks.

transducer commented 5 years ago

Alright that helped! So all the test cases should be the same size, which makes sense.