karlwancl / Trady

Trady is a handy library for computing technical indicators, and it targets to be an automated trading system that provides stock data feeding, indicator computing, strategy building and automatic trading. It is built based on .NET Standard 2.0.
https://lppkarl.github.io/Trady
Apache License 2.0
545 stars 185 forks source link

candle transformation is very slow #60

Closed pavlexander closed 6 years ago

pavlexander commented 6 years ago

I have encountered a situation with a big set of data, that transform function is very slow and needs improvements.

Default time taken to execute: 150,000 ms Time taken to execute after optimizations: 29,000 ms

Code:

IEnumerable<IOhlcv> ienumCandles = tradyCandles.AsEnumerable();
var transformedTradyCandles = ienumCandles.Transform<PerMinute, BiHourly>();

Before:

private static IOhlcv ComputeIOhlcvData(IEnumerable<IOhlcv> candles, DateTimeOffset startTime, DateTimeOffset endTime)
{
    var candle = candles.Where(c => c.DateTime >= startTime && c.DateTime < endTime);
    if (candle.Any())
    {
        var dateTime = candle.First().DateTime;
        var open = candle.First().Open;
        var high = candle.Max(stick => stick.High);
        var low = candle.Min(stick => stick.Low);
        var close = candle.Last().Close;
        var volume = candle.Sum(stick => stick.Volume);
        return new Candle(dateTime, open, high, low, close, volume);
    }
    return null;
}

After:

private static IOhlcv ComputeIOhlcvData(IEnumerable<IOhlcv> candles, DateTimeOffset startTime, DateTimeOffset endTime)
{
    List<IOhlcv> inputCandles = candles.ToList();
    List<IOhlcv> candle = new List<IOhlcv>();
    bool started = false;
    for (int i = 0; i < candles.Count(); i++)
    {
        if (!started && inputCandles[i].DateTime >= startTime)
        {
            started = true;
            candle.Add(inputCandles[i]);
        }
        else if (started)
        {
            if (inputCandles[i].DateTime >= endTime)
            {
                break;
            }

            candle.Add(inputCandles[i]);
        }
    }

    if (candle.Any())
    {
        var dateTime = candle.First().DateTime;
        var open = candle.First().Open;
        var high = candle.Max(stick => stick.High);
        var low = candle.Min(stick => stick.Low);
        var close = candle.Last().Close;
        var volume = candle.Sum(stick => stick.Volume);
        return new Candle(dateTime, open, high, low, close, volume);
    }
    return null;
}

While I do see, that there is some difference in output result (I don't have time now, to study what exactly is wrong) - the improvement is seen with a naked eye.

The only requirement is that data needs to be ordered. It can be done as the first step using LINQ. In my set of data all candles were in correct order so I skipped that part.

Additionally, I think transformation can be performed in parallel for different date-time periods.

I would appreciate if you investigated this issue more deeply and came up with some code optimizations.

P.S. Yes, according to profiler ComputeIOhlcvData takes the most time to execute. I think that the reason for that is that LINQ is iterating over all records over and over again. In the best case scenario (if data is ordered) - we need to do it only once!!! (in default implementation the code does it at least 3,000 times) In my new implementation, code also does it 3,000 time, but, it quits iterating over candles as soon as first out-of range candle is encountered. It gave 5x performance boost. I imagine that completely-rewritten transformation will get the job done in at most 1/5 time than my result..

pavlexander commented 6 years ago

Here, I have made an example, which will ONLY work for 1Minute to 2Hours transformation.

Old code:

public static IReadOnlyList<IOhlcv> Transform<TSourcePeriod, TTargetPeriod>(this IEnumerable<IOhlcv> candles)
    where TSourcePeriod : IPeriod
    where TTargetPeriod : IPeriod
{
    if (typeof(TSourcePeriod).Equals(typeof(TTargetPeriod)))
        return candles.ToList();

    if (!IsTimeframesValid<TSourcePeriod>(candles, out IOhlcv err))
        throw new InvalidTimeframeException(err.DateTime);

    if (!IsTransformationValid<TSourcePeriod, TTargetPeriod>())
        throw new InvalidTransformationException(typeof(TSourcePeriod), typeof(TTargetPeriod));

    var outIOhlcvDatas = new List<IOhlcv>();
    var periodInstance = Activator.CreateInstance<TTargetPeriod>();

    var startTime = candles.First().DateTime;
    while (startTime <= candles.Last().DateTime)
    {
        var nextStartTime = periodInstance.NextTimestamp(startTime);
        if (periodInstance.IsTimestamp(startTime))
        {
            var outIOhlcvData = ComputeIOhlcvData(candles, startTime, nextStartTime);
            if (outIOhlcvData != null)
                outIOhlcvDatas.Add(outIOhlcvData);
        }
        startTime = nextStartTime;
    }

    return outIOhlcvDatas;
}

private static IOhlcv ComputeIOhlcvData(IEnumerable<IOhlcv> candles, DateTimeOffset startTime, DateTimeOffset endTime)
{
    var candle = candles.Where(c => c.DateTime >= startTime && c.DateTime < endTime);

    if (candle.Any())
    {
        var dateTime = candle.First().DateTime;
        var open = candle.First().Open;
        var high = candle.Max(stick => stick.High);
        var low = candle.Min(stick => stick.Low);
        var close = candle.Last().Close;
        var volume = candle.Sum(stick => stick.Volume);
        return new Candle(dateTime, open, high, low, close, volume);
    }
    return null;
}

New code (optimizations Nr. 2)

public static IReadOnlyList<IOhlcv> Transform<TSourcePeriod, TTargetPeriod>(this IEnumerable<IOhlcv> candles)
    where TSourcePeriod : IPeriod
    where TTargetPeriod : IPeriod
{
    if (typeof(TSourcePeriod).Equals(typeof(TTargetPeriod)))
        return candles.ToList();

    if (!IsTimeframesValid<TSourcePeriod>(candles, out IOhlcv err))
        throw new InvalidTimeframeException(err.DateTime);

    if (!IsTransformationValid<TSourcePeriod, TTargetPeriod>())
        throw new InvalidTransformationException(typeof(TSourcePeriod), typeof(TTargetPeriod));

    var outIOhlcvDatas = new List<IOhlcv>();

    var startTime = candles.First().DateTime;
    var periodStart = GetPeriodStart(startTime);
    var periodEnd = periodStart.AddHours(2);
    List<IOhlcv> inputCandles = candles.ToList();
    List<IOhlcv> candle = new List<IOhlcv>();
    for (int i = 0; i < inputCandles.Count(); i++)
    {
        var dt = inputCandles[i].DateTime;
        if (dt >= periodEnd)
        {
            periodStart = periodEnd;
            periodEnd = periodStart.AddHours(2);

            ComputeIOhlcvDataPreprocessed(outIOhlcvDatas, candle);
            candle = new List<IOhlcv>();
        }

        candle.Add(inputCandles[i]);
    }

    if (candle.Any())
    {
        ComputeIOhlcvDataPreprocessed(outIOhlcvDatas, candle);
        candle = null;
    }

    return outIOhlcvDatas;
}

private static DateTimeOffset GetPeriodStart(DateTimeOffset dt)
{
    return new DateTimeOffset(dt.Year, dt.Month, dt.Day, dt.Hour, 0, 0, dt.Offset);
}

private static void ComputeIOhlcvDataPreprocessed(List<IOhlcv> result, IEnumerable<IOhlcv> candle)
{
    if (candle.Any())
    {
        var dateTime = candle.First().DateTime;
        var open = candle.First().Open;
        var high = candle.Max(stick => stick.High);
        var low = candle.Min(stick => stick.Low);
        var close = candle.Last().Close;
        var volume = candle.Sum(stick => stick.Volume);

        result.Add(new Candle(dateTime, open, high, low, close, volume));
    }
}

And again, I will repeat myself - the code is done for demo purposes. It has some hard-coded periods. You are free to modify it to make a universal solution out of that.

It's just to show that there is a huge room for improvement.

karlwancl commented 6 years ago

@pavlexander Thanks for the investigation & the optimized code. You're right with the 1st optimization, LINQ is lazy-evaluated, the Where clause have been evaluated every Any(), First(), Last(), Sum(), Min(), Max() operation.

A traditional for-loop (your approach) to create a new list is also a solver to this problem. I think ToList() after the Where clause may also work like the traditional loop but i have to verify.

You're also right with the 2nd optimization, the Where clause in each while loop is too costly, using your approach (iterate candles through the list) is better.

The part will be updated maybe in this weekend. Again, thanks for your contribution :D

karlwancl commented 6 years ago

@pavlexander The code is patched. Please see if the performance issue has fixed 😃

pavlexander commented 6 years ago

@lppkarl yes, I can verify that now it's 437x faster than previous implementation :)

Thank you for taking care of this issue and of course, for Trady! You are doing a great job.