Implement trading experiment

sergey-a-berezin commented 1 year ago

Consider a trading strategy which attempts to make money on volatility but without prediction:

Buy essentially at any moment, sell when the price goes above a certain threshold.
Optionally, add stop-loss, either when the price goes too low or after certain time period.
For day trading style, buy at the open, sell above a threshold or at close.

The goal is to see if there is any strategy that would do better than buy-and-hold.

Study the distribution of log-profits for high/open and close/open and see if it may help pick good parameters.

sergey-a-berezin commented 1 year ago

This, in particular, requires adding OHL prices to the DB in addition to the closing prices. For consistency, add unadjusted OHL prices. For Sharadar, reconstruct those from the Close/CloseSplitAdjusted ratio. Use these ratios to extract various adjusted OHL prices later.

Or, to optimize for the common case, save the fully adjusted OHL prices and recover the other unadjusted / split-only adjusted using the corresponding closing price ratios.

sergey-a-berezin commented 1 year ago

Add an option to plot the conditional distribution of log(close/open) when log(high/open) < T for some threshold T (target price). Since high constrains close, this conditional distribution will be biased towards lower values, and its mean will more accurately reflect the result of not hitting the target price.

sergey-a-berezin commented 1 year ago

To simplify, we can consider buying at the previous close and selling at a threshold T above the following open or at the next close. From this perspective, the distributions for high/close (X_h), open/close (X_o) and close/close (X_c) log-profits should be nearly the same Student's t-distribution. The actual value of high = max(X_o, X_h, X_c), and the selling price p = (high>T) ? T : X_c. It is obvious that p < (high>T)? T : high = min(T, high).

So, we are effectively obtaining a new distribution of high with a possibly slightly higher mean than the original distributions of X_i's, but then we are cutting it from above and reducing the mean again. This is only useful if the reduced mean is still higher than the original mean of X_i.

sergey-a-berezin commented 1 year ago

And of course, preliminary experiments indicate that the potential gain when high > T is neatly offset by the expected loss when high < T...

Perhaps, to settle the issue, we should:

See if OHLC prices can be reasonably modeled by Student's t-distribution relative to the previous close, and then taking min and max for low and high as appropriate;
Run the strategy of buying at open (or at the previous close), then either sell at the threshold T or at close, and plot the mean of this strategy as a function of T;
Plot the same function for actual prices, e.g. for QQQ, TQQQ, etc.

sergey-a-berezin commented 1 year ago

More accurately, OHLC prices should be modeled by high frequency intraday log-profits, say, minutely, with an appropriate distribution - Student's t with the "minutely" parameter a_m (denoted T(a=a_m)) such that, when compounded 24*60 times, would produce daily distribution similar to T(a=3).

Next, we can generate close from the previous close using T(a=3), and then generate the previous 7.5 hours using the minutely distribution T(a=a_m) walking backwards all the way to the open price by subtracting the log-profit samples from close. The generated intraday sequence is then summarized into the daily OHLC bar.

sergey-a-berezin commented 1 year ago

To have the ability to process minutely data, we need to extend db.Date to store the day time in addition to the date. I'm thinking of adding Hour, Minute and Second fields of type uint8, leaving another byte for Milliseconds (why not?) and thereby doubling the size of the struct to 8 bytes. This shouldn't add too much memory and storage requirements, certainly not as much as OHL prices added already.

sergey-a-berezin commented 1 year ago

Next, let's add a default header to the headless CSV tables for parfait-import. This allows reading headless CSV tickers and prices which I happen to have lying around for minutely data.

sergey-a-berezin commented 1 year ago

A test import of minutely QQQ data worked well. Now I need to import ~100 most interesting stocks to get ~30M minutely samples and run them through the "distribution" experiment to derive a typical alpha of the Student's t-distribution. Preliminary run on QQQ gives a=~2.5.

sergey-a-berezin commented 1 year ago

Oh, and I also need to filter out overnight samples from the minitely data. Let's introduce Intraday flag in the Source config which would indicate to skip log-profits that span two days.

sergey-a-berezin commented 1 year ago

Preliminary results with minutely data, 83 high-volume tickers, 2 years worth of data, ~20M price points total, ~15M in-session points (9:30am - 4pm). The bullets below are for in-session data only.

Student's t-distribution with alpha = 2.5 (consistent with my previous experiments from 1-2 years ago).
Average mean is -1.9e-7, which corresponds to -2% APY. For reference, NASDAQ composite dropped -10% in these 2 years, so about -5% APY, which suggests that a large part of growth (even if negative) happens between sessions.
- TODO: compute the average mean of daily log-profits for the same 83 stocks for better comparison
Mean stability is hard to assess with only 83 tickers, but it seems similar to the synthetic data.
MAD stability is much tighter with synthetic data compared to the real data.

The last two points repeat similar observations with daily data. This confirms my suspicion that the pattern is likely at all timeframes.

sergey-a-berezin commented 1 year ago

This, in fact, should be enough to generate synthetic OHL prices using minutely generator, just use alpha=2.5 starting from close and walking backwards towards open. For simplicity we can assume mean=0 for minutely data, which seems to reflect the reality.

sergey-a-berezin commented 1 year ago

I'm extracting the Source extension into a separate issue #130.

sergey-a-berezin commented 1 year ago

Some preliminary experiments with open[t+1]/close[t] vs. close[t+1]/close[t] log-profit distributions for NASDAQ Composite index:

open's MAD is about half the close's;
over the entire date range 1998-2023, the close's mean is much less than that of open's; however, this is biased by the early days of ~1998-2010; after that, the situation reverses (open's mean is less than close's), and closer to 2023 their means become similar, reflecting a much smoother growth over the entire day, even in-between sessions.

Normalized distributions (by close/close) over all the liquid stocks:

close's alpha=2.8, MAD=1.0 (normalized)
open's alpha=2.2, MAD=0.456 (normalized by close's MAD)

sergey-a-berezin commented 1 year ago

Next, implement a strategy simulator with the following strategies:

Buy at open, sell either at a threshold open*T or at close; plot the expected gain for varying T, compare with buy-and-hold;
Variation: buy at close[t], sell the next day at close[t]*T or keep and sell at close[t+1]*T, and so on.
Inverse: buy at open (or previous close) and sell at a stop-loss.
Variation: use a sliding stop-loss as a percentage of the highest price so far.
Combination: set both a stop-loss and a target.

sergey-a-berezin commented 1 year ago

It may be a good idea to think of a relatively generic strategy config which independently sets conditions for buy and sell.

Buy condition can be as simple as "buy at open" or "buy at close" (since we assume no historical dependency).
Sell condition can be a list of order types, e.g. "market at open or close", "limit at price", "stop loss at price or percentage". All the conditions will be scanned at each intraday bar, the first one that applies will be executed.

sergey-a-berezin commented 1 year ago

I'm going to implement a separate simulator experiment to test actual strategies - #132. I'm not yet sure what to do with this trading experiment as such, perhaps it should be folded into the distribution experiment to study the various intraday distributions and their modeling.

sergey-a-berezin commented 1 year ago

Testing the simulator on "buy-sell intraday" strategy on synthetic data with 0 mean (no inherent growth, only volatility) yields basically zero profit no matter how I set up the day trading strategy - buy & hold (for reference; it obviously won't do any good), target sell, stop-loss - fixed or trailing, selling at close or keeping overnight, etc. Any combination invariably leads to the same result: no profit.

I'm coming back to the same conclusion: any actual profit comes from the average, inherent growth of the stock value. The only question is, can we somehow protect the value from crashes and/or recover faster? This is really the fundamental question of a "safe haven" strategy.

stockparfait / experiments

Implement trading experiment #116