data structure for different subpopulations within the AgentPopulation class

sbenthall commented 2 years ago

Currently all the agent objects in AgentPopulation are just in a list.

That makes:

aggregating their stats clumsy ( https://github.com/sbenthall/HARK_ABM_INTRO_public/issues/36#issuecomment-965728521 )
it harder to share a solution between them ( #39 )

This should be handled with a nicer data structure that makes these other operations cleaner.

Note that the current way things are done is partly because of how HARK's distribute_params method works. distribute() in this repository is contorted around HARK's distribute_params() method:

https://github.com/sbenthall/HARK_ABM_INTRO_public/blob/master/HARK/hark_portfolio_agents.py#L55

This is the underlying function in HARK that could well be rewritten:

https://github.com/econ-ark/HARK/blob/master/HARK/core.py#L1664

sbenthall commented 2 years ago

Another thing this class could do, which would serve a much more general purpose, is help refactor this sort of realistic definition of a population:

https://github.com/econ-ark/DistributionOfWealthMPC/blob/master/Code/SetupParamsCSTW.py

sbenthall commented 2 years ago

See this related issue. In the current agent population code, I rescaled the parameters with ad hoc code. But this could be done with a more general utility added to HARK.

https://github.com/econ-ark/HARK/issues/995

alanlujan91 commented 2 years ago

I will discuss design here.

Current HARK agents are heterogeneous with respect to their states (cash-on-hand m, income p, assets a) but homogeneous with respect to their parameters (ex-ante identical; same preferences of CRRA, DiscFac, stock market expectations, etc).

What we need is an AgentPopulation class that allows for heterogeneity of preferences and/or beliefs (as a start, maybe others in the future).

Generically this AgentPopulation takes as inputs what parameters are to be heterogeneous, and what the distribution of those parameters are. For example: CRRA -> [bot, top, n] results in uniform distribution of agents with respect to their CRRA preferences. Other distributions could be desirable, as well as different discretizations.

For our purposes, we are thinking of varying [CRRA, DiscFac, RiskyAvg, RiskyStd]. AgentPopulation should create a grid of agents of size (CRRA_n, DiscFac_n, RiskyAvg_n, RiksyStd_n) where AgentPopulation.__sub_agent__[i,j,k,l] = PortfolioConsumerType(CRRA[i], DiscFac[j], RiskyAvg[k], RisyStd[l]). The sub-agent classes in this case are just parameter holders which describe their contained models, and should not carry agents and simulations themselves. AgentPopulation should instead hold the agents and the simulation, where [CRRA, DiscFac, RiskyAvg, RiskyStd] become states themselves.

Assuming we have created an AgentPopulationSolution object (which I will discuss below) an agent would transition by calling aNrm = mNrm - solution[t].cFunc(mNrm, CRRA, DiscFac, RiskyAvg, RiskyStd). If we are careful about compartmentalizing the parameters that will actually change during simulation (RiskyAvg, RiskyStd), this could even be reduced to aNrm = mNrm - solution[t].cFunc(mNrm, RiskyAvg, RiskyStd). In essence, RiskyAvg and RiskyStd also become state variables that evolve over the simulation.

Now to AgentPopulationSolution. In AgentPopulation we created a grid of subagents. Let's assume that our population is discrete along [CRRA, DiscFac] but continuous along [RiskyAvg, RiskyStd]. The AgentPopulationSolution would traverse the grid and solve every single sub agent, which gives (CRRA_n * DiscFac_n * RiskyAvg_n * RiksyStd_n) different solution objects. AgentPopulationSolution now has the task of "stitching" all of these solutions together to make a Population Solution.

Continuing with the example, for every [CRRA, DiscFac] which are discrete in the population, our solution depends on [RiskyAvg, RiskyStd] and mNrm. We already created cFunc(mNrm), so now we create an interpolator such that we can have cFunc(m, avg, std). Going back to what I wrote earlier, once this "stitching" is complete, the way to access the solution could be cNrm = solution[t, CRRA, DiscFac].cFunc(mNrm, RiskyAvg, RiskyStd), where [t, CRRA, DiscFac] are exogenous or deterministic states, and [mNrm, RiskyAvg, RiskyStd] are endogenous and evolving states.

alanlujan91 commented 2 years ago

Another important source of ex-ante heterogeneity is income processes for different education classes, which is more of what cstwMPC does.

sbenthall commented 2 years ago

I think all this is great.

One thing I'll add is that the AgentPopulation should be initialized with configurable Distributions for varying parameters. The current implementation assumes Uniform distributions.

Also, the parameters determining the shape of the distribution (top and bottom for Uniform, mean and std for Normal, etc.) should be separated from the approximation parameter (the n for number of values to discretize the distribution into).

So the initial parameterization of the AgentPopulation should be:

a dictionary of parameters, whose values are either
- scalars for fixed values
- a data structure [current lists, but could be more specialized data class] for time-varying values
- a continuous Distribution, fully parameterized
- A time-varying distribution? See https://github.com/econ-ark/HARK/blob/master/HARK/distribution.py#L33
- a data structure (maybe a dictionary to start) for a categorically varying parameter (such as by education level). Values for each category could be:
  - a scalar
  - a time-varying value
  - a distribution...

It would actually make sense for there to be an AbstractAgentPopulation (or TrueAgentPopulation, or something) that takes only these parameters, which then generates a discretized or approximate AgentPopulation when given:

n for all its continuous distributions
grid parameters for continuous state variables.

@nicksawhney could get started on the first part of this.

sbenthall commented 2 years ago

For context, this issue in HARK represents a design ideal that Chris feels strongly about. As long as we are writing new code/designs, it makes sense to model this new design.

https://github.com/econ-ark/HARK/issues/914

sbenthall commented 2 years ago

@llorracc has some draft work on functionality like this in his "2.0 pre-ALPHA" HARK PR:

https://github.com/econ-ark/HARK/blob/3ba91db642bd0394ef93b414cbd27f98fcdaf56f/HARK/ConsumptionSaving/ConsIndShockModel_AgentTypes.py

Note especially prmtv_par and aprox_lim as separate namespaces within the parameters within AgentTypePlus.

sbenthall commented 2 years ago

Note the very interesting part of the @llorracc implementation that uses progressively granular approximations to accelerate discovery of the solution.

llorracc commented 2 years ago

I feel strongly that we need to refine our technology for defining models; I think a model is not well defined without some unambiguous specification of what idealized object the approximations are approximating. Seb's idea of taking the number of approximating points as an input seems like a sensible one.

I'd argue, though, for a somewhat more flexible approach than the structure Seb describes. In particular, I think that we should separate the machinery for describing the distribution from the machinery for organizing the information that the machinery needs.

That is, at each point where a distribution needs to be generated, the code's endpoint should be a call to some user-defined function (e.g., make_parameter_distribution(parameter_name,distribution_description,time_description)) and the distribution_description would contain the info needed to construct the approximation.

Like, distribution_description might contain:

The name of a class that describes the distribution (Say, DiscreteApproxToMeanOneLogNormalTruncated)
The actual inputs that the class needs (variance, number of approximating points, method of approximation)
The limiting characteristics if computational resources were infinite

The upshot is that the first priority should be to improve and standardize our tools for describing any particular distribution. Only when that is done will we know what inputs we generically need to keep track of for the larger description.

PS. Another logically prior step is to settle any outstanding questions about how we want to keep track of time/date/epoch/age/subperiod.

sbenthall commented 2 years ago

Hi @llorracc . I'm not sure I follow what you're saying entirely. What do you mean by time_description ?

sbenthall commented 2 years ago

Also, @llorracc I think that because of the timeline for development around SHARKFin, this repository is going to need to err on the side of imperfect but functioning implementations, as opposed to building off of perfect "HARK 2.0" implementations.

I know that for HARK 2.0 you want a lot of generality in problem representation which isn't in the current (pre-1.0) version of HARK. I think we can make a lot of progress building towards 1.0 without taking on the full 2.0 scope.

sbenthall commented 2 years ago

I confused myself about this, but another point to clarify here is that the distributions in the current use case are specifically over the population of agents (i.e., the agent count with each CRRA level) as opposed to being probability distributions for exogenous shocks.

sbenthall commented 2 years ago

Summary of meeting about this with @llorracc :+1:

we agree that model parameters should be given a class that describes the distribution, and its arguments/parameters
ideally discretization methods are decoupled from the distribution classes. This requires a change to HARK. https://github.com/econ-ark/HARK/issues/1091
SHARKFin won't block on HARK developments but we should both out towards 'ideal' designs.
Currently that HARK internals uses Python primitives __getitem__ and __iter__ for 'time-varying' parameters. This has led to the implementation of an IndexDistribution in HARK for representing a time-varying distribution. https://github.com/econ-ark/HARK/blob/master/HARK/distribution.py#L33 This is not the ideal way to represent time-varying parameters because of its ambiguities when used across finite, infinite, and seasonal problems. Again, this is a case where core HARK improvements are required for "ideal" SHARKFin behavior. But SHARKFin can begin by supporting a limited set of agents/problems.

sbenthall commented 2 years ago

Python has a system for creating data types that are not as heavy weight as classes: https://docs.python.org/3/library/typing.html

These could be used for "time varying" parameters. There may be many ways to improve HARK with this set of language features.

sbenthall commented 2 years ago

Earlier, I put a design document for this new class here: https://github.com/sbenthall/SHARKFin/blob/master/design/AgentPopulationDesignDocument.ipynb

Feel free to use that notebook for further work designing this AgentPopulation class.

alanlujan91 commented 2 years ago

I've been looking at this and I'm sketching an idea

Going back to your typing suggestions tho, type aliases and new types seem to be intended for static linting, but can't type check at run time, right? Is there something else in that page that I should be looking at?

sbenthall commented 2 years ago

This is a good guide to types in Python: https://realpython.com/python-type-checking/

Yes, Python is still dynamically typed even with type hints. The static check imposes clarity on the architecture. It is possible to use explicit type checks in the software itself if it's functionally important.

nicksawhney commented 2 years ago

What's the status of this issue? We created the AgentList object as a temporary data structure to handle groups of different agents in the future, but it seems like much of the discussion has moved to @alanlujan91 's new agent population code. Should we close this issue?

sbenthall commented 2 years ago

@nicksawhney It's true issue #52 is proposed as a solution to this. #52 is still in progress.

I prefer to leave issues open as TODO items until they are settled by a final PR's. PR's are options to settle issues. This is to some extent just a matter of convention/style.

sbenthall commented 2 years ago

Closed with #52

sbenthall / SHARKFin

data structure for different subpopulations within the AgentPopulation class #42