jmbejara / comp-econ-sp19

Main Course Repository for Computational Methods in Economics (Econ 21410, Spring 2019)
48 stars 26 forks source link

Putting Bounds on an Array for MLE #52

Closed jonhelium closed 5 years ago

jonhelium commented 5 years ago

When calculating the MLE, we have to put lower bounds on the parameter values for all 𝛼. I'm a bit confused on how to do this in scipy.optimize.minimize since we use 𝛼 as an array based on the parameter vector theta : how should we go about inputting bounds for elements in an array within the minimize() function? In addition, would we have to define our criterion function with multiple inputs 𝜆 , 𝛼, 𝛽, and 𝛾 instead of with the parameter vector theta as a single input in order for us to put bounds on 𝛼 in scipy.optimize.minimize?

jmbejara commented 5 years ago

theta is just a vector, of which a segment of that vector represents the alpha parameters. So, to put bounds on alpha, just put bounds on theta. That means you'll put bounds on the whole theta vectors. For the parameters that you don't want to bind at all, set their upper and lower bounds to np.inf and -np.inf, respectively.

You should write your criterion function as a function of theta. Then, the first thing you do is break up theta into its respective parts:

lam = theta[0:4]
beta = theta[4:8]
alpha = theta[8:13]
gamma = theta[13:19]
jonhelium commented 5 years ago

If we define our criterion function as a function of theta, wouldn't there be no need to break downtheta into lambda, beta, alpha, and gamma since we already defined our log-likelihood function as a function of theta and did the same break down earlier in the 1st question?

jmbejara commented 5 years ago

I just find it convenient to break it up. It's hard to remember that, for example, beta[2] = theta[6]. It makes for more readable code.

Also, the criterion function is the (negative) log-likelihood function. So, I suppose this all depends on how you chose to code things up. In my code, the log-likelihood function takes in theta as an input---not the individual parameters. But these details don't matter too much in the end.

jonhelium commented 5 years ago

Sorry to further dig into this issue, but I'm having trouble running scipy.optimize with my criterion function and bounds. I suspect I'm simply just writing the bounds wrong, which I defined as: bnds = ((-np.inf, np.inf), (-np.inf, np.inf), (-np.inf, np.inf), (-np.inf, np.inf), (-np.inf, np.inf), (-np.inf, np.inf), (-np.inf, np.inf), (-np.inf, np.inf), (0, np.inf), (0, np.inf), (0, np.inf), (0, np.inf), (0, np.inf), (-np.inf, np.inf), (-np.inf, np.inf), (-np.inf, np.inf), (-np.inf, np.inf), (-np.inf, np.inf), (-np.inf, np.inf))

My log-likelihood function also took in the vector 'theta' as an input, so the criterion function simply is the negative value of that function (and also takes in theta). Given this criterion function and my bounds, I keep encountering "divide by zero encountered in log" warnings and a never-ending running kernel. Is this due to incorrect bound definitions, or could it be something with the definition of my log-likelihood function (which appears to work fine in Q1)?

jmbejara commented 5 years ago

No worries! Did you do the part mentioned in the hint:

It's normal if you get a few warnings (e.g. RuntimeWarning: divide by zero encountered in log). In your code you will occasionaly get probabilities at zero or close to zero. This will create values of -inf occasionally. Whenever a np.log(Phi) is equal to -inf, change it to -1e+08. That is, change it to -100000000.

Also, it could take a while to run, depending on how you coded it up. On my laptop, it takes about 6 minutes to run. I vectorized a lot of my operations, but I could have still optimized it further. If you have written it using lots of for-loops, you can expect the optimization to take significantly longer. That's one of the reasons why I gave you the value of the log-likelihood for theta0. This makes it easier to make sure you're on the right path without having to run the optimization algorithm a whole bunch of times. You can also increase the tolerance on practice runs to make sure it runs faster while you're still debugging.

jmbejara commented 5 years ago

If you're getting good results in Q1, then the optimization problem could take more than 30 minutes to run. You might try leaving it the background and letting it run. I use the %%time magic to time how long it takes.

jonhelium commented 5 years ago

Ah okay I understand now - I did indeed follow the hint in Q1 and am getting good results. I didn't realize that the actual optimization could take a while - I've been stopping the kernel after it ran for more than 30 seconds, so I'll let it run and see what happens. Thanks!

Also, an additional question on problem Q3: it appears that one of the four alphas (𝛼4 specifically) is missing in table 4 of the paper, giving us only 18 elements within theta_br (I believe theta should have 19 elements). Should we just assume that 𝛼4 is 0 (I tried doing this but ended up with a different negative log-likelihood value than the expected 263.09)?

jmbejara commented 5 years ago

As a note, if you include options={'disp':True} as a keyword argument to minimize, it will print updates on the terminal (not in the notebook) of the optimization process. That way, you can make sure it's running.

The table is missing an alpha, but I just took that to mean zero. I could be interpreting it wrong. I'm not totally sure what's up. But, when I run the optimizer, it gives me a zero estimate for that parameter. So, I guess that works.

jmbejara commented 5 years ago

I didn't put any specific bounds on alpha_4. It just came out as zero.

Also, to make sure that your log-likelihood function is right, you can plug in their estimates from the table into your log-likelihood function. You should get 263.09.

Also, while you're developing, you can set the initial value of the optimizer to the parameter estimates from the table. If you're starting basically right at the solution, the optimizer should finish really fast. The, you can slowly move that initial estimate further away from the parameters in the table to see if it still works. Then, when you're convinced it's working right, you can start from the initial parameter guesses that I supply in the HW.

jonhelium commented 5 years ago

Okay everything actually seems to be working great now! I think I switched up the places of two lambdas within the theta vector, which was giving me issues. After fixing that I'm getting 263.09 from theta, so hooray!

jonhelium commented 5 years ago

Sorry, but another minor issue has arisen: for Q4, np.allclose(theta, theta_br, rtol=0.2) gives me False despite theta_br being very close to theta. I suspect this is because the values from table 4 that are intheta_br are rounded to two decimal places (the values in theta would indeed exactly round to these): should we use a slight higher rtol then?

jmbejara commented 5 years ago

I'm getting this as my MLE estimate:

array([-0.53105114,  2.26253387,  0.34522307,  0.22698121, -0.48084866,
       -0.03044045,  0.00350979, -0.02145047,  0.86303471,  0.03470013,
        0.15057873,  0.        ,  0.08046184,  0.52817594,  0.75573093,
        0.46457693,  0.59837449,  0.12016968, -0.73548591])

This works with rtol=0.2. I'm not sure what could be going on. If it's close enough, then that's probably fine.

jonhelium commented 5 years ago

Some of my estimates are off by just a little bit (around 0.02 at most) - np.allclose(theta, theta_br, rtol=0.22) does give True though, so is that close enough?

jmbejara commented 5 years ago

Sounds good enough to me!