ta4j / ta4j

A Java library for technical analysis.
http://www.ta4j.com
Other
2.06k stars 715 forks source link

"constrained time" series vs "subseries" #125

Closed nimo23 closed 6 years ago

nimo23 commented 6 years ago

I have a timeseries s and want to create a new subseries s1 from it. s has 20 ticks with index 0 to 19. s1 should be a subseries from s containing only the last 10 ticks from the actual index backwards.

TimeSeries s1 = new BaseTimeSeries(s, index - 10, index);

However, after printing "s1.getTickData()" and "series.getTickData()", both timeseries has the same ticks included. s1 normally should have only 10 ticks included and not 20. What is wrong?

team172011 commented 6 years ago

Because it calls internally: this(defaultSeries.getName(), defaultSeries.getBarData(), seriesBeginIndex, seriesEndIndex, true);

All Ticks will be added to the subseries, just the begin and end index are modified

Javadoc of TimeSeries#getBarData() (getTickData() in master):

    /**
     * Warning: should be used carefully!
     * <p>
     * Returns the raw bar data.
     * It means that it returns the current List object used internally to store the {@link Bar bars}.
     * It may be:
     *   - a shortened bar list if a maximum bar count has been set
     *   - a extended bar list if it is a constrained time series
     * @return the raw bar data
     */
    List<Bar> getBarData();
nimo23 commented 6 years ago

So s1 contains all ticks of s but ONLY the ticks from "seriesBeginIndex" to "seriesEndIndex" will be considered by any other indicators (for example, highestValueIndicator). Am I right?

// returns the highest value within index - 10 to index
new HighestValueIndicator(s1);
// returns the highest value within the whole timeSeries.
new HighestValueIndicator(s);
team172011 commented 6 years ago

Yes, I think so. As far the indicators use getBeginIndex(). I am unsure what happens if you call s1.getBar(0).

team172011 commented 6 years ago

Many indicators use statements like int index = Math.max(0. value) to prevent negative values for timeFrame or so ... If working on a subseries it must be int index = Math.max(series.getBeginIndex, value)

Example in ArronUp indicator: int endIndex = Math.max(0,index - timeFrame);

nimo23 commented 6 years ago

So normally, such calls within indicators

int endIndex = Math.max(0,index - timeFrame);

should always be done this way:

int endIndex = Math.max(series.getBeginIndex, index - timeFrame)

Or?

It does not matter if subseries or not. If series is not a subseries, then "series.getBeginIndex" will also return "0" as a beginIndex. Or? But for subseries, it returns the correct beginIndex as declared.

So it would better to change all indicators which make use of int endIndex = Math.max(0, index - timeFrame) and change it to int endIndex = Math.max(series.getBeginIndex, index - timeFrame). Or?

team172011 commented 6 years ago

The begin index is not alway 0, especially for not for subseries. If you insert into TimeSeriesTest the additional test case:


    @Before
    public void setUp() {
        bars = new LinkedList<>();
        bars.add(new MockBar(ZonedDateTime.of(2014, 6, 13, 0, 0, 0, 0, ZoneId.systemDefault()), 1d));
        bars.add(new MockBar(ZonedDateTime.of(2014, 6, 14, 0, 0, 0, 0, ZoneId.systemDefault()), 2d));
        bars.add(new MockBar(ZonedDateTime.of(2014, 6, 15, 0, 0, 0, 0, ZoneId.systemDefault()), 3d));
        bars.add(new MockBar(ZonedDateTime.of(2014, 6, 20, 0, 0, 0, 0, ZoneId.systemDefault()), 4d));
        bars.add(new MockBar(ZonedDateTime.of(2014, 6, 25, 0, 0, 0, 0, ZoneId.systemDefault()), 5d));
        bars.add(new MockBar(ZonedDateTime.of(2014, 6, 30, 0, 0, 0, 0, ZoneId.systemDefault()), 6d));

        defaultName = "Series Name";

        defaultSeries = new BaseTimeSeries(defaultName, bars);
        constrainedSeries = new BaseTimeSeries(defaultSeries, 2, 4);
        emptySeries = new BaseTimeSeries();

        Strategy strategy = new BaseStrategy(new FixedRule(0, 2, 3, 6), new FixedRule(1, 4, 7, 8));
        strategy.setUnstablePeriod(2); // Strategy would need a real test class

    }

    @Test
    public void subSeriesVsSeries(){
        assertEquals(constrainedSeries.getBar(0), defaultSeries.getBar(0));  //should fail!! 
        assertEquals(constrainedSeries.getBar(5), defaultSeries.getBar(5)); //should fail!!
    }

Both tests should fail, because constrainedSeries should not have index 0, or the first Bar via getBar(0) should not be the same as from defaultSeries. Also the second test should fail, because the constrainedSeries should not have a fifth bar. But both tests passed!

All indicators that use absolute indices of the time series could deliver wrong (or out of scope) results on subseries.

nimo23 commented 6 years ago

Please read my previous comment again. I have said:

Actually it is so that If series is not a subseries, then it will also use "0" as a wrong beginindex within a indicator caluclation, because the indicator calculation uses '0' and not "series.getBeginIndex"- and this is wrong!

So it would better to change ALL indicators which make use of int endIndex = Math.max(0, index - timeFrame) and change it to int endIndex = Math.max(series.getBeginIndex, index - timeFrame).

Yes or no?

We should not use '0' but "series.getBeginIndex" within all indicators.

For example, HighestValueIndicator or LowestValueIndicator must use this kind of loop to be correct:

for (int i = Math.max(series.getBeginIndex(), index - timeFrame + 1); i <= index; i++) {
} 

Actually, I get a bunch of stackoverflow errors when using subseries with these indicators.

We must correct all indicators which make use of such absolute indices and use 'series.getBeginIndex()' instead of '0'.

team172011 commented 6 years ago

Yes, this is what I wanted to show with my post. Just to clarify:

It does not matter if subseries or not. If series is not a subseries, then "series.getBeginIndex" will also return "0" as a beginIndex. Or? But for subseries, it returns the correct beginIndex as declared

series.getBeginIndex() does not always return 0. It could be any other value, because the series is a subseries or because the series has reached its maximum tick count and the first ticks had been 'deleted'.

So we are on the save side if using getBeginIndex() instead of 0 and getBeginIndex() instead of getBarCount()

nimo23 commented 6 years ago

So we have to find all indicators making use of "0" or "getBarCount()" and change them. We should also add additional test cases within TimeSeriesTest.

nimo23 commented 6 years ago

Is series.getBeginIndex()always available for any type of indicator?

For example, if I have a indicator with a constructor providing only another indicator (without a timeSeries in the constructor).

public MyIndicator(Indicator<Decimal> ref, int timeFrame) {
super(ref);
}

then I need to use this for my loop:

for (int i = Math.max(ref.getTimeSeries().getBeginIndex(), index - timeFrame + 1); i <= index; i++) {
...
} 

Is ref.getTimeSeries().getBeginIndex() always available?

mdeverdelhan commented 6 years ago

@nimo23 All those implementing AbstractIndicator. See https://github.com/ta4j/ta4j/blob/587b1908644dc38804ba2b60332812075439d534/ta4j-core/src/main/java/org/ta4j/core/indicators/AbstractIndicator.java#L45 (All of them except ConstantIndicator as far as I remember).

By the way, why would you want to build a subseries using ta4j? I was planning to remove this feature as I was judging it irrelevant/useless.

nimo23 commented 6 years ago

I need to calculate the "highestValue" of different subseries. I want to implement https://www.mql5.com/en/articles/2930.

So I need something like this:

// calculate highest value of different subseries
Decimal h = new HighestValueIndicator(series, timeFrame);
BaseTimeSeries s1 = new BaseTimeSeries(series, index - timeFrame / 2 + 1, index);
Decimal h1 = new HighestValueIndicator(s1, timeFrame);
BaseTimeSeries s2 = new BaseTimeSeries(series, index - timeFrame + 1, index - timeFrame / 2);
Decimal h2 = new HighestValueIndicator(s2, timeFrame);
..

So the highest value of subseries s1 can be different than the highest value of subseries s2, which can also differ from series s. With actual ta4j I get stackoverflow error and it hangs when calculating the highest/lowest value of subseries.

mdeverdelhan commented 6 years ago

Ok. It is why I wanted to remove the subseries feature. According to me your use case does not justify for ta4j to provide it. It would be up-to-you to build the subseries by yourself (it would also be clearer).

To be more precise I wanted to keep only the indexes for the run() methods (for WalkForwardOptimization case). See: https://github.com/ta4j/ta4j/blob/master/ta4j-core/src/main/java/org/ta4j/core/TimeSeriesManager.java#L91

nimo23 commented 6 years ago

does not justify for ta4j to provide it

Other libs such as backtrader or the like all provide resampling so there must be common use cases for that which justify to provide it.

it would also be clearer

I dont think so. Actually it s a one-liner. Should I extract all the ticks again and again and put these ticks into new timeseries only to be able to use indicators based on these subseries? not very elegant and not very performant, dont you think so?

Actually, ta4j has constrained timeseries, which is a kind of subseries. However, constrained timeseries suffer from the bug described above.

(Maybe it can be easily solved by providing a static method within BaseTimeSeries which extracts the ticks and puts it into a new BaseTimeSeries?)

team172011 commented 6 years ago

(Maybe it can be easily solved by providing a static method within BaseTimeSeries which extracts the ticks and puts it into a new BaseTimeSeries?)

I have also thought about this solution. It is easy to maintenance and does exactly what the user would expected

nimo23 commented 6 years ago

the constrained constructor should also be changed or something else, users expect to behave what it says..

mdeverdelhan commented 6 years ago

Other libs such as backtrader or the like all provide resampling so there must be common use cases for that which justify to provide it.

Backtrader is a more complete framework than ta4j. For instance it provides plotting, broker simulation, and numerous other features I was not thinking to add here. If you add resampling you will soon be tempted to provide bar merging (i.e. merge 1-day bars into 1-week bars), etc. IMO it is not this lib's job.

I dont think so. Actually it s a one-liner. Should I extract all the ticks again and again and put these ticks into new timeseries only to be able to use indicators based on these subseries? not very elegant and not very performant, dont you think so?

I meant: clearer on the internal ta4j logic side. Of course it would be useful/simpler for the user.

(Maybe it can be easily solved by providing a static method within BaseTimeSeries which extracts the ticks and puts it into a new BaseTimeSeries?)

You can. But then, this limit case won't work anymore: https://github.com/ta4j/ta4j/blob/master/ta4j-core/src/main/java/org/ta4j/core/TimeSeriesManager.java#L157 To explain: it is the case where you backtest a strategy in a constrained way and the last trade is still open, today we check in the future bars if we can close it. Actually it is the only reason which made me keep the full list of bars instead of copying the relevant part.

nimo23 commented 6 years ago

Backtrader is a more complete framework than ta4j

yes, unfortunatley, it is not in java and in java no alternatives exists, besides ta4j;)

edlins commented 6 years ago

To explain: it is the case where you backtest a strategy in a constrained way and the last trade is still open, today we check in the future bars if we can close it. Actually it is the only reason which made me keep the full list of bars instead of copying the relevant part.

I don't understand this. I changed that bit of code to simply close the trade at runEndIndex

tradingRecord.operate(runEndIndex, timeSeries.getTick(runEndIndex).getClosePrice(), amount);

This way I get "paper" losses and gains included in the analysis. Previously, if I had large paper gains but the trade was open at end then those gains were lost in the analysis.

mdeverdelhan commented 6 years ago

I changed that bit of code to simply close the trade at runEndIndex

Then your last trade is dishonest. It was not closed by your strategy but only because you did not have new bars. You may detect big losses even if you have a good strategy (or the opposite).

Previously, if I had large paper gains but the trade was open at end then those gains were lost in the analysis.

No. If the trade is open at the end, we try to see if it can be closed using the underlying bars which are after the runEndIndex. If so, we also return this last trade, if not, we remove it. It is what is done there.

edlins commented 6 years ago

I see, thanks for explaining. I'm concerned with portfolio value as well as closed-trades strategy performance. But I should treat those separately. Buy and hold on the S&P 500 over the last 30 years is up > 900% but has a cash flow of 1.0, and as a strategy is only as "good" as "stuff money in your mattress" simply because the trade stays open.