joshuaulrich / TTR

Technical analysis and other functions to construct technical trading rules with R
GNU General Public License v2.0
326 stars 102 forks source link

ZigZag bug: Two inflection points on one day #67

Open nateaff opened 6 years ago

nateaff commented 6 years ago

Hi Joshua,

There appears to be a bug in the zigzag.c function called by ZigZag.

The zigzag.c function calculates both high and low inflection points and uses a single vector, zigzag, to record these inflection points. It's possible, however, for both high and low inflection points to occur on the same day. When this happens, zigzag.c overwrites the value of the first determined inflection point with the value of the second inflection point.

Here's code illustrating the problem. ZigZag finds two successive highs, with the second high occurring on December 9th. The problem date is highlighted in magenta in the plot this code produces.


library(TTR)

url <- "https://gist.githubusercontent.com/nateaff/f350a75db0da7f3cb19b88de8a5ec6ac/raw/cca16152d4d9024cf25062edb060d573531c8d37/SPY_2015_11_03__2015_12_31.csv"
SPY <- read.csv(url)

HL <- SPY[ , c("High", "Low")]

# Example of a percent threshold at which the issue occurs
percent = 1.25
res <- ZigZag(HL, percent, TRUE, FALSE, FALSE)

# Index of problem date
idx = 26

plot(SPY$Date, HL$High, col = "blue", ylim = c(min(SPY$Low)-1, max(SPY$High)+1))
points(HL$Low, col = "red", pch = 16)
lines(res, col = "green", lwd = 2)

# Mark the second successive high value
points(SPY$Date[idx], SPY$High[idx], col = "magenta", pch = 16)

Since the current zigzag.cpp function only returns a single vector, I don't think the problem could be fixed without changing the return type.

One option would be to create a separate ZigZag2() function that tracks high and low inflection points in separate vectors and returns a list of two vectors. This would only require changing a few lines, aside from variable initialization.

The main lines that would need updating are in the downtrend conditional statement:

/* Trend Reversal */
    if (high[i] >= extreme_max) {
       /* Record a high inflection point */
      zigzag_high[reference.index] = reference.price;

and in the uptrend conditional statement:

/* Trend Reversal */
   if (low[i] <= extreme_min) {
     /* Record a low inflection point */
     zigzag_low[reference.index] = reference.price;

The final lines after the main for loop would also need to be updated:

/* Record final two inflection points */
 if(signal > 0){
   zigzag_low[reference.index] = reference.price;
   zigzag_high[inflection.index] = inflection.price;
 } else {
   zigzag_high[reference.index] = reference.price;
   zigzag_low[inflection.index] = inflection.price;
 }

Hope this is helpful.

joshuaulrich commented 6 years ago

Wow, thanks for the detailed report! I think I understand the problem, but it's not clear to me how returning two vectors from C would help correct the result of the R function. Could you expand on that?

nateaff commented 6 years ago

This doesn't really fix the problem of returning a single series of interpolated values between the inflection points. I don't see a nice way of returning an interpolated series without more extensive fiddling.

In particular, since you can have two inflection points on a day, returning a [scalar-valued] time series of the same length as the original is not possible.

The alternative version I'm suggesting (ZigZag2) would have the C zigzag function return separate high and low numeric vectors. In the outer ZigZag function these could be transformed into two logical vectors indicating high and low inflection points.

This approach basically leaves it to the user to decide what to do with the inflection points. For example, this leaves the returned data the same length as the input series and it makes it easy to plot markers on at inflection points, rather than an interpolated series. I understand this isn't the way the zigzag output is usually plotted.

I tested a version of the code that follows what I'm suggesting and it didn't require that many modifications of the current code.

I haven't used the TTR API much, to be honest, so I also don't know if this fits in with the style of other functions in the package.

joshuaulrich commented 6 years ago

Thanks again for the detailed response! I would really appreciate it if you could send me a diff/patch of your changes that does what you suggest. That will help me understand fully and without any ambiguity, since computers are pedantic creatures.

I'm going to be busy with the R/Finance conference until next week, so I likely will not take a look until Sunday, at the earliest.

nateaff commented 6 years ago

OK. It will take me a couple of days anyway since the version I tested was a port to pure R (which I did for easier debugging of the initial issue).

nateaff commented 6 years ago

Hi Joshua,

I guess it was more than a couple of days before I could get back to this!

I just made a pull request that demonstrates changes that could be made to the ZigZag function in order to create a possible alternative ZigZag function that handles 'dual' inflection points.

As a reminder, pull request isn't intended to be merged to TTR but was done so you look at the diff to see the proposed changes. The version I pushed returns logical vectors, one for indicating the location of High inflection points and the other indicating the location of Low inflection points. The alternative ZigZag function could just as easily return numeric vectors.

Here's an updated code sample demonstrating how this alternative version might be used to mark the ZigZag inflection points. The plot indicates an example of a dual inflection point in magenta.

# ZigZag alternative version demonstration 

library(TTR)

url <- "https://gist.githubusercontent.com/nateaff/f350a75db0da7f3cb19b88de8a5ec6ac/raw/cca16152d4d9024cf25062edb060d573531c8d37/SPY_2015_11_03__2015_12_31.csv"
SPY <- readr::read_csv(url)

HL <- SPY[ , c("High", "Low")]

# Example of a percent threshold at which the issue occurs
percent = 1.25
res <- ZigZag(HL, percent, TRUE, FALSE, FALSE)

# Are there any dual points 
dual_idx <- which(res$Low & res$High)

if(length(dual_idx) > 0) {
  message("There is a dual inflection point at the following indices: ", dual_idx)
}

# Plot showing High and low inflections points
plot(SPY$Date[res$High], SPY$High[res$High], col="green", pch=15, ylim = c(min(SPY$Low)-1, max(SPY$High)+1))
points(SPY$Date, SPY$High)
points(SPY$Date[res$Low], SPY$Low[res$Low], col="red", pch=15)
points(SPY$Date, SPY$Low)

# Plot high and low of the dual inflection point
points(SPY$Date[dual_idx], SPY$High[dual_idx], col = "magenta", pch = 16)
points(SPY$Date[dual_idx], SPY$Low[dual_idx], col = "magenta", pch = 16)