mlenert / BIOS8366_Homework

0 stars 0 forks source link

Homework 1 Submission #1

Closed mlenert closed 5 years ago

mlenert commented 6 years ago

@nstrayer starting thread for comments and grading feedback.

nstrayer commented 6 years ago

Grading

Overall

Very good work. You definitely have a great grasp of what we've covered. My biggest suggestion would be thinking a bit more about the presentation of the results. I appreciate that you actively tackled the fact that these algorithms are stochastic by taking multiple samples for each parameter set.

1

1.1

Good stuff here. Well commented and modular functions. Your answer is right, too!

5/5

1.2

You were on the right path. Basically we want to log our observations, then feed that to the log-normal likelihood and let optimize.minimize do its magic.

Here's an example:

from scipy.stats.distributions import norm

data = np.c_[days, beetles]

logistic = lambda t, K, r, N0=1.: K*N0 / (N0 + (K-N0) * np.exp(-r*t))

def likelihood(params, data=data):
    K, r, s = params
    return -np.sum(norm.logpdf(y, logistic(t, K, r), s) for t,y in data)

In addition to find your standard errors and correlation super easy you can simply make a bootstrap function that repeats the optimization.

min_func = lambda s: minimize(likelihood, (1000, 0.1, 1), args=s, method='Nelder-Mead')

def bootstrap(data, nsamples, f):
    boot_samples = data[np.random.randint(len(data), size=(nsamples, len(data)))]
    return [f(s)['x'] for s in boot_samples]

estimates = bootstrap(data, 100, min_func)

np.std(estimates, 0)
# array([ 78.66226319,   0.90165523,  38.1702207 ])

np.corrcoef(np.array(estimates).T[:-1])
#array([[ 1.        , -0.28651239],
#       [-0.28651239,  1.        ]])

1/5

2

2.1a

The code looks good but perhaps a bit too nested-if statement-y. Particularly in the tau loop in AIC_Annealing and the compareSchedules function it becomes a bit hard to follow along with the nesting/lack of comments.

I'm not sure that I get much from the plots. Static 3d plots are pretty hard to read and in this scenario I think multiple lines corresponding to different runs gets the point across rather well.

I do like that you are confronting the fact that the algorithm is stochastic by running multiple iterations however, and your results on the surface (judging by the table) fall in line with what we would expect.

4/5

2.1b

I'd like to see see plots with this, although the tables do give a decent idea of what's going on. The variances look large enough that it seems like it would be hard to make any definitive statements about which is better.

4/5

2.2a

Good! Nice concise and well commented code here.

I like the plots and that you ran multiple samples. However it may be nice to also show the individual runs results to. Either as a line tracing the best aic for multiple runs or the scatters of all the population's AICs over the generations. There's some interesting nuances in how the algorithms behave that show up in those plots but not in an averaged plot like you have.

5/5

2.2b

Good again. Same comments as the last problem about showing the results a bit more explicitely.

5/5

2.2c

Good stuff. Why did you choose 0.02 as your mutation rate? Do you think the results would change at all if you varied that?

5/5

3

Fancy stuff here. Looks like a pretty good solution as well.

5/5

4

Beautiful. I knew with your SQL background this problem would be easy for you.

5/5

Grade:

39/45