probprog / bopp

BOPP: Bayesian Optimization for Probabilistic Programs
Other
114 stars 18 forks source link

syntax for providing a list of :initial-points for bopp #1

Open BorisVSchmid opened 6 years ago

BorisVSchmid commented 6 years ago

I am trying to figure out how to correctly provide a list of initial points in BOPP. Am I overlooking something simple?

Here are the details. The doopt, the output of BOPP, and the defopt.

  (def model (atom (lazy-seq (doopt :smc stochSIR [] 1
                                    :speed-option :careful
                                    ;;
                                    :bo-options {:verbose 2
                                                 :num-initial-points 2
                                                 :num-scaling-thetas 1000
                                                 :debug-folder true
                                                 :initial-points [{:intro_day 35, :intro_duration 1, :distancing 0.6322107370456067, :beta_b 0.05537357023136269, :beta_bS 0.19621440146596547, :beta_lice 0.16614719314835572, :beta_p 0.16744317384178953}]
                                                 }))))

And from the output I get before the thing crashes is that BOPP doesn't like the format I use for initial-points. One, it reformats the map into a collcetion of [keyword values], and two, it drops the last keyword-value pair ({:beta_p 0.16744317384178953} in this case)

:intial-points ([:intro_day 35] [:intro_duration 1] [:distancing 0.6322107370456067] [:beta_b 0.05537357023136269] [:beta_bS 0.19621440146596547] [:beta_lice 0.16614719314835572] {:intro_day 32, :intro_duration 1, :distancing 1.024606244767797, :beta_b 0.06818428961187084, :beta_bS 0.1391747830274707, :beta_lice 0.14780650814426455, :beta_p 0.27853257654947244}

If I do not specify any initial points, BOPP runs fine, and feeds the run-epidemic function with the sampled maps of priors that the run-epidemic function expects.

  (anglican.emit/with-primitive-procedures
    [run-epidemic]
    (defopt stochSIR [] [intro_day intro_duration distancing beta_b beta_bS beta_lice beta_p]
      (let [intro_day (sample (uniform-discrete 30 80))
            intro_duration (sample (uniform-discrete 0 5))
            distancing (sample (uniform-continuous 0.5 1.3))
            beta_b (sample (uniform-continuous 0 0.1))
            beta_bS (sample (uniform-continuous 0 0.3))
            beta_lice (sample (uniform-continuous 0 0.3))
            beta_p (sample (uniform-continuous 0 0.3))
            _ (println {:intro_day intro_day :intro_duration intro_duration :distancing distancing :beta_b beta_b :beta_bS beta_bS :beta_lice beta_lice :beta_p beta_p})
            epidemic-score (run-epidemic {:intro_day intro_day :intro_duration intro_duration :distancing distancing :beta_b beta_b :beta_bS beta_bS :beta_lice beta_lice :beta_p beta_p})]
        ;;
        ;; scores that are higher than (log 2) are due to the classifier neural network guessing worse than random. No need to give that bonus-points for guessing worse than random, so the peak score lies at (log 2)
        (observe (normal (log 2) 0.1) (:score epidemic-score)))))
twgr commented 6 years ago

Hey Boris

Could you go into a bit more detail about exactly what behavior you are trying to achieve with the initial points?

At the moment, then :initial-points is a parameter of deodorant (rather than bopp itself) and thus inevitably takes in inputs of the form required for the Bayesian optimization scheme, rather than bopp. Its syntax is just of the form (map #(into [] (cons % (f %))) initial-thetas) where initial-thetas is a vector of the values of each input (in the order specified by the defopt input rather than the order they are necessarily sampled) and f is the target function of the optimization. As you've noted, then :initial-points is printed out when you have set :verbose 2. However, this is done whether you provide them as an input or not and so if you run BOPP without setting the :initial-points input, you'll see this things come out in a different format, e.g. :intial-points ([[4.551116063315992 5.5256651622872415] -25.212731299326094 ([])] [[9.477340236977007 9.521154069744426] -49.43158747606592 ([])]) which is is the format you need to match if you provide it manually yourself. Note that, in BOPP, f actually returns a tupple of the partition function estimate and the program outputs so this syntax might not be as exactly you expect - e.g. as you have no outputs, then there is a trailing list of and empty vector. So in the above example, [4.551116063315992 5.5256651622872415] is the input point, -25.212731299326094 is the marginal likelihood estimate, and ([]) is the output of the program (which happens to be empty).

The intention of :initial-points was to give a way of "restarting" the Bayesian optimization with some previously evaluated points, hence it is necessary to be in the format of deodorant (not BOPP) to avoid re-evaluation of these points.

If you instead want a means of forcing the initial-thetas to take on a certain set of values, I could add this quite easily, but this would require a separate input.

BorisVSchmid commented 6 years ago

Hi Tom.

Thanks for the explanation. For some reason I can't get the "restarting" of the initial-points to work. I should try that with a more minimal model.

But you are right, yeah, I am looking for a way to suggest to BOPP to evaluate some initial-thetas defined by me (manual fits) in addition to the num-initial-points it samples itself.