vicsharp-shibusa / kyna

Open source stock data collection and analysis.
MIT License
2 stars 1 forks source link

Backtest Performance Issues #72

Closed vicsharp-shibusa closed 5 months ago

vicsharp-shibusa commented 7 months ago

Running a backtest just takes too long.

Using the following configuration, it completes in 4.64 minutes.

{
    "Type": "Candlestick Pattern",
    "Name": "Bullish Engulfing 1",
    "Source": "eodhd.com",
    "Description": "move: 0.1;prologue len: 15;trend desc: S21C;vol factor: 1",
    "Entry Price Point": "Close",
    "Target Up": {
        "Price Point": "High",
        "Value": 0.1
    },
    "Target Down": {
        "Price Point": "Low",
        "Value": 0.1
    },
    "Signal Names": [
        "Bullish Engulfing"
    ],
    "Length of Prologue": 15,
    "Max Parallelization": 10,
    "Only Signal With Market": false,
    "Volume Factor": 1,
    "Chart Configuration": {
        "Interval": "Daily",
        "Moving Averages": null,
        "Trend Configuration": [
            {
                "Trend": "S21C",
                "Weight": null
            }
        ]
    },
    "Market Configuration": null
}

One problem is the length of time it takes to complete this query:

        public string FetchCodesAndCounts => _dbDef.Engine switch
        {
            DatabaseEngine.PostgreSql => @"
SELECT P.code, E.industry, E.sector, COUNT(P.*)
FROM eod_adjusted_prices P
JOIN entities E ON P.source = E.source AND P.code = E.code
WHERE P.source = @Source
GROUP BY P.code, E.industry, E.sector
HAVING COUNT(P.*) > 500 AND
AVG(P.close) > 15",
            _ => ThrowSqlNotImplemented()
        };

This could be improved by adding some columns to the entities table, namely:

  1. eod_count - the number of eod records.
  2. avg_close - the average close on all eod records.
  3. avg_volume - the average volume on all eod records.

The migration routine will have to be adjusted to preserve these database points on each migration.


Another enhancement to the database might be to store some averages with each price record:

  1. The average height up to that point.
  2. The average body height up to that point.
  3. The average volume up to that point.

In the signal match algorithms, they could likely be improved by rearranging the conditions. By putting the Sentiment check at the top of the if statement, it will lower complexity.


Profiling the execution indicates that memory continues to increase as the program runs; might be an opportunity to clean up some objects instead of waiting for garbage collection to do it.

vicsharp-shibusa commented 7 months ago

Rearranging the order in which the if conditions are evaluated reduced the runtime to 3.66 minutes; ~20% improvement.