twitter / BreakoutDetection

Breakout Detection via Robust E-Statistics
GNU General Public License v2.0
755 stars 181 forks source link

R code results do not match C++ code results (version: multi) #9

Open eamocanu opened 9 years ago

eamocanu commented 9 years ago

On the same scribe data input: 105.08333,90.90000,763.90000,83.36667,78.36667,80.58333,76.36667,210.98333,78.00000,77.51667,83.01667,89.23333,84.86667,653.16667,70.91667,72.83333,75.91667,73.53333,548.86667,66.23333,73.45000,66.96667,71.11667,68.31667,285.38333,317.20000,63.28333,64.08333,60.50000,550.88333,399.68333,75.90000,115.35000,78.93333,88.68333,475.53333,30.11667,31.51667,34.08333,39.55000,47.51667,423.63333,52.55000,50.21667,61.41667,56.61667,64.41667,742.30000,165.85000,122.88333,122.21667,114.66667,565.96667,134.70000,141.16667,160.78333,168.48333,458.65000,513.28333,154.36667,130.66667,125.93333,127.25000,615.58333,122.90000,97.45000,122.76667,115.10000,111.95000,442.78333,113.83333,116.11667,128.70000,135.03333,138.75000,153.38333,143.58333,161.50000,168.11667,152.25000,147.11667,163.91667,161.10000,146.95000,132.65000,127.28333,116.10000,92.28333,54.88333,111.35000,114.98333,110.98333,1015.35000,774.58333,232.65000,134.61667,130.25000,98.66667,102.40000,184.86667,258.76667,70.33333,81.38333,81.10000,89.21667,536.96667,85.83333,95.63333,76.10000,94.38333,73.25000,346.70000,65.38333,84.73333,140.56667,120.60000,121.38333,359.23333,55.28333,54.55000,52.18333,56.20000,112.11667,208.53333,49.40000,49.06667,56.06667,54.01667,63.51667,344.41667,42.06667,55.36667,55.96667,55.85000,56.30000,46.56667,49.25000,43.90000,357.61667,44.10000,44.68333,43.13333,40.55000,452.20000,47.06667,40.00000,42.35000,48.36667,44.86667,48.51667,244.01667,50.16667,48.73333,47.91667,51.96667,343.33333,35.25000,45.33333,46.86667,48.78333

The R version returns 47, 87 for res = breakout(Scribe, min.size=24, method='multi', beta=.001, degree=1, plot=TRUE) while the C++ multi version returns 26, 51, 76, 101 for edm_multi(scribeData, 24, 0.001, new Linear).

Shouldn't both return the same as the R code?

putnam120 commented 9 years ago

How are you calling the C++ code? The last function argument should be of type int, so I'm not sure how you aren't getting a run time error.

eamocanu commented 9 years ago

Hi I changed the signature because it couldn't find IntegerVector to use an array, but that's not the point. In my call above I put Linear to show that it uses the linear function. Putting a 1 is not that clear (at least to me) Anyway, my point is not that the code is not working. The point is that the C++ code gives different results than the R code if you give it the parameters I show in my original post. I was wondering why that is the case. Maybe the code is broken so I'm just giving you a heads up.

putnam120 commented 9 years ago

Thanks for the clarification.

I modified my version of the package and was able to obtain the results that you mentioned.

However, there is nothing wrong with the package. If you look at the R code you will notice that before performing any analysis all of the values are scaled to be in the interval [0,1]. This is because R is able to handle arbitrarily larger numbers while C++ can't (well at least not with the standard libraries). If you perform the scaling and then apply the C++ method you obtain the same results.

Hope that helps.

eamocanu commented 9 years ago

Thanks for the explanation.