vasudeva-ram / Julia-SSJ

Training material to help solve Heterogeneous agent New Keynesian (HANK) models in Julia using the Sequence Space Jacobian method (Auclert et. al., 2021)..
13 stars 3 forks source link

identical KS solution to paper #7

Closed floswald closed 3 months ago

floswald commented 3 months ago

My aim was to get a method that delivers numerically identical solutions to what is reported in the paper and in the their notebooks. When I say they I mean of course them. I wanted to reproduce this result from their KS notebook:

Screenshot 2024-07-18 at 14 43 20

This required, as you said, to change the solution approach slightly; no longer search for an r that clears the asset market, but specify a set of targets which the SS should achieve (r = 0.01, Y =1, asset_mkt = 0), while choosing 3 variables beta, Z, K. This leads to a few small modifications in your code. Let us call the approach search_beta approach.

Identical solution?

I am using nlsolve to look for beta, Z, K. The obtained solution is numerically identical (except for beta which is slightly different, but acceptable) to what they get (notice n_a = 500):

julia> SSJ.main_r(0.01);
[ Info: beta = 0.9894626201799099
[ Info: K = 3.142857142857143
[ Info: Z = 0.8816460975214567
Steady state interest rate: 0.010000000000000002
Steady state wage rate: 0.8900000000000001
Steady state Z: 0.8816460975214567
Steady state β: 0.9894626201799099
Steady state aggregate capital: 3.620693240677122
Steady state aggregate labor: 1.0
Steady state aggregate output: 1.0

Performance is poor

I found performance relative to their code to be really bad. Their steady state takes about 1 second to compute on my computer:

%timeit ss = ks.solve_steady_state(calibration, unknowns_ss, targets_ss, solver='hybr')
1.09 s ± 614 μs per loop (mean ± std. dev. of 7 runs, 1 loop each)

Whereas my implemention of search_beta takes 37 times longer.

@time SSJ.main_r(0.01);
 37.073224 seconds (35.15 M allocations: 57.352 GiB, 10.43% gc time, 0.24% compilation time)

I can change that a little bit by decreasing the tolerance on the nlsolve call but not really 37 times.

Performance fairly poor on previous version

I timed your solution as well:

julia> @time SSJ.main();
 12.727960 seconds (13.57 M allocations: 19.042 GiB, 9.75% gc time)

so, despite only having to solve a univariate equation system, this takes 12 times longer than theirs, for the same grid sizes.

performance improvements

I investigated a few things:

  1. type instabilities: there was one instability which I removed; no big change. so that's not it.
  2. allocations: we seem to do way too many allocations. 13m allocs in your version ( my version is roughly scaling your version by 3 because there are 3 dimensions to search over...). This concerns EGM, consumptiongrid etc, each of which re-allocate new matrices upon each call. We can clearly do better here by storing those matrices on the model object once and for all.
  3. autodiff: I tried out a few different options for the nonlinear system solver. NonlinearSolve.jl looks promising but really relies on autodiff. So I went and tried to make the code usable, but there are so many type annotations which trip up the AD systems that this is never going to work. This branch has a few attempts at this, with extra types that do not type annotate, so that they could be used in an AD call; no success however.
  4. linear interpolation. We spend almost 100% of the time in linpolate.(policymat[:,i]). In particular in searchsortedlast. that's the function that searches for which bracket x lies in xvec. given that our xvec is always monotonically increasing, we can do much better here. if you found the bracket of x[i] is j, you know that the bracket of x[i+1] has to be greater or equal to j.

Let me know if you have some bandwidth to investigate those performance implications or whether you are actually fine with the package as is, given you want to use if mainly for teaching purposes.

vasudeva-ram commented 3 months ago

First of all, thank you for implementing the search_beta method. Knowing that it produces the same result as Auclert et. al. provides a lot of confidence that the nuts-and-bolts of the method work correctly (i.e., the implementation of the EGM, Young's lottery method, etc.).

That said, I'm very surprised at the timing results you've shared. On my machine, the original method (finding r) takes only about 3 seconds (which is still slower than the ShadeEcon implementation, but not terrible). See below:

Screenshot 2024-07-18 at 1 53 30 PM

12 seconds (or 37 seconds for that matter) to find the steady state is unacceptably slow for such a simple model. I have not fetched the changes from the last few days yet - did something perhaps change in the main implementation code?

floswald commented 3 months ago

not sure. I'm going back to the last commit before I touched anything:

* a29b069 - some minor changes (6 weeks ago) <Vasudeva>
* 2370630 - adjusted discretization process and related calls (7 weeks ago) <Vasudeva>
* e9b99a9 - removed production.jl file (3 months ago) <Vasudeva>
* 12af096 - updated Aiyagari model for speed (3 months ago) <Vasudeva>
* 7122f59 - updated aiyagari.jl to introduce nonlinear solver (3 months ago) <Vasudeva>
* aed6c04 - removed old shockdiscretization file (3 months ago) <Vasudeva>
* 771134e - updated helper functions with both tauchen and rouwenhorst discretization (3 months ago) <Vasudeva>
* e5458d6 - Update README.md (7 months ago) <vasudeva-ram>

floswald@PTL11077 ~/g/Julia-SSJ (firm)> git checkout a29b069                                              (base) 
Note: switching to 'a29b069'.

floswald@PTL11077 ~/g/Julia-SSJ ((a29b069b))>   

and then I'm running your code again:

julia> include("Aiyagari.jl")
  Activating project at `~/git/Julia-SSJ`
Precompiling project...
  3 dependencies successfully precompiled in 7 seconds. 290 already precompiled.
main

julia> @time main();
  5.916080 seconds (9.00 M allocations: 9.077 GiB, 7.61% gc time, 21.19% compilation time)

julia> @time main();
  4.649198 seconds (5.74 M allocations: 8.863 GiB, 8.52% gc time)

julia> 

keep in mind that this code uses n_a = 200 vs the n_a=500 in the paper. So the 5 secs we see here are going to blow up. The fact that it takes "only" 3 secs for you means you have a faster computer than I have. Still, given the less than 2 times lower grid size, that's really surprisingly slow.

vasudeva-ram commented 3 months ago

Hi @floswald - I looked at the performance issue. Of the 4 possible issues you listed, I think the interpolation package I use is the most likely culprit. It turns out that the Shade-Econ package of Auclert et. al. uses its own interpolation scheme that very efficiently utilizes the monotonicity of the savings policies over the grid. See here.

Re: autodiff, I went to great lengths to annotate the functions because I intended this to be a training tool. Annotations (at least to me) help understand code much better. But as you say, this makes it a nightmare for autodiff. We could go in and change out the annotation to be more type-flexible (I mean doing stuff like using func(x::Vector{TF}) where TF to support the custom types of ForwardDiff, ReverseDiff and Zygote).

In general, however, this raises a question re: the purpose of the package. Do we want it to remain a teaching tool? Or do we want to be a full-fledged Julia implementation of Shade-Econ's Python package? The latter is a much larger project. Since this is my job-market year, I may not have a lot of time to dedicate to it. That said, it would be a cool project. What do you think?

floswald commented 3 months ago

yes well that was my question indeed. I need something with maximum performance for a research project. I'm not sure I agree that there is a strict tradeoff wrt performance vs usability for teaching; if anything, the promise of julia is the exact opposite. So in an ideal world, one could use it for teaching and for research. Notice that this is not a pipe dream, I've actually managed to do this previously https://github.com/floswald/DCEGM.jl .

Regarding your time constraints, I understand of course - more important things to do at the moment. I'm not sure what to suggest; I can of course develop my own fork and at some point in the future merge it back into what you had built; at that point it may have diverged quite far from your objective though. Also, it's very useful to have another human look at code. It depends mainly on you. :-)

vasudeva-ram commented 3 months ago

Hi @floswald - after some consideration, I think it would be good to develop this into a full-fledged Julia-based implementation of the SSJ methodology. This repo was my first time ever using Julia, so there will be much room for improving efficiency. Additionally, I have implemented much bigger HANK models now, and think that things like code organization could also be improved. While I do think these will reduce intuition and readability, I trust you that this goal is not completely orthogonal to having a learning tool.

My time commitments remain strained until December (at least) due to job market pressures. So please allow for slow responses from me until then.

Apart from that, I look forward to collaborating and building this out fully!

vasudeva-ram commented 3 months ago

Given that branch V3 achieves the goals of this branch, and is much faster, does this branch still need to exist?

floswald commented 3 months ago

You can close this PR yes. Probably would not delete any branch (at least on the remote) as there is no real cost.