Calibration/structural estimation of retail order participation

sbenthall commented 1 year ago

What are the stylized facts of retail order volume (30% ? ) that we are trying to match.

Match a mean -- DPHM, attention; Mean: Get from @mesalas
Match the variance? - attention? number of agents? SD: 5% -- see #195

We can then do a calibration exercise to figure out what we can tweak to get that as an output.

Calibrate to stylized facts.

We can do the calibration with the MockMarket.

Get in touch with @wjt5121 about this ...

mesalas commented 1 year ago

Ill try to find some reference to back up the 30% value.

mesalas commented 1 year ago

Have done some research and my conclusion is that retail volume seems to be around or slightly above 20%. I think we should try and target that number.

Here's a slide from virtu citing 605 reports that wholesalers have to produce: 2AF9A22E https://www.sec.gov/comments/265-28/26528-8901054-242178.pdf

Reuters has an article from 2021 citing 25+% in July and aug 2021

The head of research at NASDAQ estimates 20% in 2020 9866BECD https://www.nasdaq.com/articles/who-counts-as-a-retail-investor-2020-12-17

Forbes is citing JP Morgan that retail volume hit 23% in the beginning of 2023 https://www.forbes.com/sites/dereksaul/2023/02/03/retail-trading-just-hit-an-all-time-high-heres-what-stocks-are-the-most-popular/?sh=714a70b46664

CEBO has an interesting report that finds that retail volume initially was targeting "meme" stocks but now targets the broader market: The graph shows the actual volume and not percentage of total, but they note that the volume has been relative consistent since 2Q 2021 while the fraction of meme stocks has dropped. BB77633D

https://www.cboe.com/insights/posts/the-evolution-of-retail-investment-activity/

This might be consistent with this report of lower retail volume if a large fraction of the volume is traded in meme stocks outside of the index? https://www.reuters.com/business/retail-traders-account-10-us-stock-trading-volume-morgan-stanley-2021-06-30/

sbenthall commented 1 year ago

Awesome.

Our current levers for setting the retail investor involvement are the attention parameter and dollars_per_hark_money_unit.

I expect we might also get an interaction with broker fees once we include them: https://github.com/sbenthall/SHARKFin/issues/188

I suppose we can run some calibration tests to figure out the right ballpark for parameters to get this 'stylized fact' and then sweep within that ballpark to get a sense of the effects of variations?

mesalas commented 1 year ago

More stats https://www.nasdaq.com/articles/who-counts-as-a-retail-investor-2020-12-17

sbenthall commented 1 year ago

We can do the calibration step with the MockMarket.

sbenthall commented 1 year ago

@mesalas So, what are our targets exactly?

I believe you would like:

a mean daily trade volume of $\mu_v$
with some target standard deviation $\sigma_v$

These should be chosen relative to the trading volume of other agents on the market.

Essentially I need to write a program to optimize the parameter choices with respect to a loss function reflecting these two moments. That would be ideal.

But ... what are the ballpark numbers I should be aiming at? In case I don't have time to solve this in a general way.

sbenthall commented 1 year ago

Hitting this target will depend on #195 , since having more people in the population will increase the retail trade volume and make it easier to spread that activity evenly.

mesalas commented 1 year ago

But ... what are the ballpark numbers I should be aiming at? In case I don't have time to solve this in a general way.

I would go for a total volume of 40000 (on average 20000 buy and 20000 sell) and a std of about 5%

sbenthall commented 1 year ago

Hmmm. Ok.

Doing some testing with the original WHITESHARK population, getting up to this volume means ramping up the DPHM parameter to something like 500,000.

Recall that in our Whiteshark simulation, raising this value to 15,000 (much lower) got us consistent market crashes as the retail investors overpowered the institutional investors. But that was with memory-based expectations.

I assume that for G.G. Shark we will stick with UsualExpectations. That way, trade volume will be less correlated, as it will be responding to idiosyncratic labor shocks rather than market prices.

sbenthall commented 1 year ago

@mesalas I've been working on this, and it looks like there are several reasons why calibrating the macro agents to get this realistic distribution of retail investor activity is not going to work.

I can get the volume up. Thanks to @alanlujan91 's work, I can now easily make a population of 1000 macro-agent, with a low attention rate (0.05), getting on average 50 agents to pay attention per day. With DPHM at 830,000, that gets a mean buy volume per day of 18,600 over one quarter. All these parameters can be scaled up and down.

The problem, simply, is that with USUAL expectations (where the agents compute their expected return from the true dividend process statistics), the agents never want to sell. They just hold onto the asset, collect the dividend, and reinvest. So the mean sell value is 0, and the standard deviation on the buy side is 0.48.

Here's what the simulation of one quarter looks like:

I think that leaves the following options for G.G. Shark:

Switch to the memory-based FinanceModel for agent expectations. Which we know will crash the market with so much retail investor clout! So, not really an option.

...

Just run noise through the broker. I.e. create a Simulation type where the broker activity is drawn from a random distribution like what you want. We've ran simulations like this with Calibration before.

Shall we do that? If so, do you want the buy/sell side to be normally distributed?

We'll need to discuss the implications of the interaction between expectations and stylized facts of retail investor activity later.

sbenthall commented 1 year ago

@mesalas and I have decided that for G.G. SHARK, he can just sample the broker activity on the AMMPS side, simplifying the whole thing a lot.

In the future, we should figure out what it takes to get the macro agent's trading activity to match these stylized facts.

sbenthall commented 1 year ago

General question for @llorracc : Which 'stylized facts' (about price process, retail investor activity, etc.) and calibration information (population model, etc.) if any, are targets for FIRE SHARK?

We should enumerate these empirics and/or assumptions in a separate file, then refer to it when doing SE and/or configuration for the FIRE SHARK experiments.

(These could be different for Good? Shark and so doing this rigorously will keep the framework supple.)

llorracc commented 1 year ago

I'm a bit confused what you are asking. mNrmStE is from the solution stage. Some of the other things you mention are from the simulation stage.

For the results from the converged .solution[0] I can't see any reason to pare them down. The storage space occupied by these parameters will be trivial, and it would add extra work and complication if I removed one that later I realized we might need.

For the simulation stage, we will want to start by keeping the entire history of dates (I guess counting up from the first period after the burn-in period; we may as well call that period [0]. So, I guess the thing to do would be to store a snapshot of all the agent's state variables at every date at which the agent observes prices. If we need to pare that down, then I'd suggest a subtractive procedure: Make an automated list of the agent's state and parameter values, and then allow us to configure a method like excise([item],[simulation]).

On Tue, Mar 21, 2023 at 4:23 PM Sebastian Benthall @.***> wrote:

General question for @llorracc https://github.com/llorracc : Which 'stylized facts' (about price process, retail investor activity, etc.) and calibration information (population model, etc.) if any, are targets for FIRE SHARK?

We should enumerate these empirics and/or assumptions in a separate file, then refer to it when doing SE and/or configuration for the FIRE SHARK experiments.

(These could be different for Good? Shark and so doing this rigorously will keep the framework supple.)

— Reply to this email directly, view it on GitHub https://github.com/sbenthall/SHARKFin/issues/199#issuecomment-1478533238, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAKCK73ZKMMWVWULBVIXQDTW5IE4XANCNFSM6AAAAAAVOXR2TM . You are receiving this because you were mentioned.Message ID: @.***>

--

Chris Carroll

sbenthall commented 1 year ago

Sorry, the context shift may be confusing. The issue about mNrmStE is #205.

This is #199, which I expect is a totally separate topic.

SHARKFin already does a lot of what you describe here. I'm not asking you about how to get data out of the solution or simulation.

I'm asking what (if any) empirical facts (such as: no autocorrelation of stock returns; what percent of order volume is filled by retail orders; what ex ante heterogeneity is there in the population of the U.S.A.) are important for FIRE SHARK (specifically: the Economics publication we have been building towards).

My understanding is that we would use these facts in one of two ways:

Entering them as calibration to the model (as we do with ex ante population parameters), or
Estimate (through search, etc.) the parameters needed for the simulation to match the moments of these facts

But both of these approaches, important for validating the model, require a stock of facts that we care about matching. I imagine which facts matter depends on disciplinary taste.

So, I'm asking: which facts should we be using to calibrate and validate FIRE SHARK? It's not an urgent question. But answering it systematically would help us avoid what just happened in G.G. SHARK, which is a scramble to achieve a calibration which turned out not to be possible under our model assumptions.

sbenthall commented 1 year ago

CDC says:

For the first step, calibrate:

Variance and mean return of dividends match the variance and mean return of prices, from the S&P500
Lognormal dividend process.
To start with, just pure Lucas agents -- homogeneous, where there's no labor income.

What we are not matching in the first round:

prices are more volatile than fundamentals (dividends) (Campbell Schiller literature)
heteroskedasticity

sbenthall commented 1 year ago

@alanlujan91 makes the point that we should be attentive to the starting wealth levels. wealthier agents might sell more.

sbenthall / SHARKFin

Calibration/structural estimation of retail order participation #199