Closed sbenthall closed 2 years ago
Another thing this class could do, which would serve a much more general purpose, is help refactor this sort of realistic definition of a population:
https://github.com/econ-ark/DistributionOfWealthMPC/blob/master/Code/SetupParamsCSTW.py
See this related issue. In the current agent population code, I rescaled the parameters with ad hoc code. But this could be done with a more general utility added to HARK.
I will discuss design here.
Current HARK agents are heterogeneous with respect to their states (cash-on-hand m, income p, assets a) but homogeneous with respect to their parameters (ex-ante identical; same preferences of CRRA, DiscFac, stock market expectations, etc).
What we need is an AgentPopulation
class that allows for heterogeneity of preferences and/or beliefs (as a start, maybe others in the future).
Generically this AgentPopulation
takes as inputs what parameters are to be heterogeneous, and what the distribution of those parameters are. For example: CRRA -> [bot, top, n]
results in uniform distribution of agents with respect to their CRRA preferences. Other distributions could be desirable, as well as different discretizations.
For our purposes, we are thinking of varying [CRRA, DiscFac, RiskyAvg, RiskyStd]
. AgentPopulation
should create a grid of agents of size (CRRA_n, DiscFac_n, RiskyAvg_n, RiksyStd_n)
where AgentPopulation.__sub_agent__[i,j,k,l] = PortfolioConsumerType(CRRA[i], DiscFac[j], RiskyAvg[k], RisyStd[l])
. The sub-agent classes in this case are just parameter holders which describe their contained models, and should not carry agents and simulations themselves. AgentPopulation
should instead hold the agents and the simulation, where [CRRA, DiscFac, RiskyAvg, RiskyStd]
become states themselves.
Assuming we have created an AgentPopulationSolution
object (which I will discuss below) an agent would transition by calling aNrm = mNrm - solution[t].cFunc(mNrm, CRRA, DiscFac, RiskyAvg, RiskyStd)
. If we are careful about compartmentalizing the parameters that will actually change during simulation (RiskyAvg, RiskyStd), this could even be reduced to aNrm = mNrm - solution[t].cFunc(mNrm, RiskyAvg, RiskyStd)
. In essence, RiskyAvg and RiskyStd also become state variables that evolve over the simulation.
Now to AgentPopulationSolution
. In AgentPopulation
we created a grid of subagents. Let's assume that our population is discrete along [CRRA, DiscFac]
but continuous along [RiskyAvg, RiskyStd]
. The AgentPopulationSolution
would traverse the grid and solve every single sub agent, which gives (CRRA_n * DiscFac_n * RiskyAvg_n * RiksyStd_n)
different solution objects. AgentPopulationSolution
now has the task of "stitching" all of these solutions together to make a Population Solution.
Continuing with the example, for every [CRRA, DiscFac]
which are discrete in the population, our solution depends on [RiskyAvg, RiskyStd]
and mNrm
. We already created cFunc(mNrm)
, so now we create an interpolator such that we can have cFunc(m, avg, std)
. Going back to what I wrote earlier, once this "stitching" is complete, the way to access the solution could be cNrm = solution[t, CRRA, DiscFac].cFunc(mNrm, RiskyAvg, RiskyStd)
, where [t, CRRA, DiscFac]
are exogenous or deterministic states, and [mNrm, RiskyAvg, RiskyStd]
are endogenous and evolving states.
Another important source of ex-ante heterogeneity is income processes for different education classes, which is more of what cstwMPC
does.
I think all this is great.
One thing I'll add is that the AgentPopulation should be initialized with configurable Distributions for varying parameters. The current implementation assumes Uniform distributions.
Also, the parameters determining the shape of the distribution (top and bottom for Uniform, mean and std for Normal, etc.) should be separated from the approximation parameter (the n
for number of values to discretize the distribution into).
So the initial parameterization of the AgentPopulation should be:
It would actually make sense for there to be an AbstractAgentPopulation (or TrueAgentPopulation, or something) that takes only these parameters, which then generates a discretized or approximate AgentPopulation when given:
n
for all its continuous distributions@nicksawhney could get started on the first part of this.
For context, this issue in HARK represents a design ideal that Chris feels strongly about. As long as we are writing new code/designs, it makes sense to model this new design.
@llorracc has some draft work on functionality like this in his "2.0 pre-ALPHA" HARK PR:
Note especially prmtv_par
and aprox_lim
as separate namespaces within the parameters within AgentTypePlus
.
Note the very interesting part of the @llorracc implementation that uses progressively granular approximations to accelerate discovery of the solution.
I feel strongly that we need to refine our technology for defining models; I think a model is not well defined without some unambiguous specification of what idealized object the approximations are approximating. Seb's idea of taking the number of approximating points as an input seems like a sensible one.
I'd argue, though, for a somewhat more flexible approach than the structure Seb describes. In particular, I think that we should separate the machinery for describing the distribution from the machinery for organizing the information that the machinery needs.
That is, at each point where a distribution needs to be generated, the code's endpoint should be a call to some user-defined function (e.g., make_parameter_distribution(parameter_name,distribution_description,time_description)) and the distribution_description would contain the info needed to construct the approximation.
Like, distribution_description might contain:
The upshot is that the first priority should be to improve and standardize our tools for describing any particular distribution. Only when that is done will we know what inputs we generically need to keep track of for the larger description.
PS. Another logically prior step is to settle any outstanding questions about how we want to keep track of time/date/epoch/age/subperiod.
Hi @llorracc . I'm not sure I follow what you're saying entirely. What do you mean by time_description
?
Also, @llorracc I think that because of the timeline for development around SHARKFin, this repository is going to need to err on the side of imperfect but functioning implementations, as opposed to building off of perfect "HARK 2.0" implementations.
I know that for HARK 2.0 you want a lot of generality in problem representation which isn't in the current (pre-1.0) version of HARK. I think we can make a lot of progress building towards 1.0 without taking on the full 2.0 scope.
I confused myself about this, but another point to clarify here is that the distributions in the current use case are specifically over the population of agents (i.e., the agent count with each CRRA level) as opposed to being probability distributions for exogenous shocks.
Summary of meeting about this with @llorracc :+1:
__getitem__
and __iter__
for 'time-varying' parameters. This has led to the implementation of an IndexDistribution in HARK for representing a time-varying distribution. https://github.com/econ-ark/HARK/blob/master/HARK/distribution.py#L33 This is not the ideal way to represent time-varying parameters because of its ambiguities when used across finite, infinite, and seasonal problems. Again, this is a case where core HARK improvements are required for "ideal" SHARKFin behavior. But SHARKFin can begin by supporting a limited set of agents/problems.Python has a system for creating data types that are not as heavy weight as classes: https://docs.python.org/3/library/typing.html
These could be used for "time varying" parameters. There may be many ways to improve HARK with this set of language features.
Earlier, I put a design document for this new class here: https://github.com/sbenthall/SHARKFin/blob/master/design/AgentPopulationDesignDocument.ipynb
Feel free to use that notebook for further work designing this AgentPopulation class.
I've been looking at this and I'm sketching an idea
Going back to your typing suggestions tho, type aliases and new types seem to be intended for static linting, but can't type check at run time, right? Is there something else in that page that I should be looking at?
This is a good guide to types in Python: https://realpython.com/python-type-checking/
Yes, Python is still dynamically typed even with type hints. The static check imposes clarity on the architecture. It is possible to use explicit type checks in the software itself if it's functionally important.
What's the status of this issue? We created the AgentList object as a temporary data structure to handle groups of different agents in the future, but it seems like much of the discussion has moved to @alanlujan91 's new agent population code. Should we close this issue?
@nicksawhney It's true issue #52 is proposed as a solution to this. #52 is still in progress.
I prefer to leave issues open as TODO items until they are settled by a final PR's. PR's are options to settle issues. This is to some extent just a matter of convention/style.
Closed with #52
Currently all the agent objects in AgentPopulation are just in a list.
That makes:
This should be handled with a nicer data structure that makes these other operations cleaner.
Note that the current way things are done is partly because of how HARK's distribute_params method works.
distribute()
in this repository is contorted around HARK'sdistribute_params()
method:https://github.com/sbenthall/HARK_ABM_INTRO_public/blob/master/HARK/hark_portfolio_agents.py#L55
This is the underlying function in HARK that could well be rewritten:
https://github.com/econ-ark/HARK/blob/master/HARK/core.py#L1664