jonathancornelissen / highfrequency

The highfrequency package contains an extensive toolkit for the use of highfrequency financial data in R. It contains functionality to manage, clean and match highfrequency trades and quotes data. Furthermore, it enables users to: calculate easily various liquidity measures, estimate and forecast volatility, and investigate microstructure noise and intraday periodicity.
147 stars 63 forks source link

Lee-Mykland jump test #68

Closed waynelapierre closed 2 years ago

waynelapierre commented 3 years ago

If I want to use Lee-Mykland jump test on 1-min returns, I need to adjust the value of K. I am not sure if the function intradayJumpTest automatically adjusts the value of K. For 5-min returns, I saw this code from the manual:

LMtest <- intradayJumpTest(pData = sampleTData[, list(DT, PRICE)],
volEstimator = "RM", driftEstimator = "none",
RM = "bipower", lookBackPeriod = 20, 
alignBy = "minutes", alignPeriod = 5, marketOpen = "09:30:00",
marketClose = "16:00:00")

If I change the alignPeriod to 1, should I also change the lookBackPeriod?

emilsjoerup commented 3 years ago

That code uses 20 5 minute returns to estimate the spot volatility. If you were to set alignPeriod to 1, then it would use 20 1 minute returns to estimate the spot volatility.

lookBackPeriod is kind of like a tuning parameter, you can set it to what you want. The parameter isn't adjusted automatically, you have to decide what it should be.

Did something in the documentation suggest it is changed? If so I will alter it to be more clear.

Best, Emil

waynelapierre commented 3 years ago

The Lee-Mykland jump test requires the lookback period to be adjusted with the sampling frequency of prices at the same time. See and Lee_Mykland_2008 and Theodosiou_Zikes_2009wp.

If I only change the sampling frequency of prices, then it is no longer a Lee-Mykland jump test.

emilsjoerup commented 3 years ago

Can you please elaborate? What's suggested in the references is to use a lookback period of sqrt(252 M) which for e.g. M = 79 -> 5 minute sampling is impossible as sqrt(79 252) ≈ 141.1

waynelapierre commented 3 years ago

You are correct. Something is wrong.

waynelapierre commented 3 years ago

When I run the example code of intradayJumpTest manual, I also got the warning message: Warning messages: 1: timezone of object (EST) is different than current timezone (). 2: timezone of object (EST) is different than current timezone ().

Not sure what this is about.

emilsjoerup commented 3 years ago

When I run the example code of intradayJumpTest manual, I also got the warning message: Warning messages: 1: timezone of object (EST) is different than current timezone (). 2: timezone of object (EST) is different than current timezone ().

Not sure what this is about.

This is a warning from the xts package that your xts object has a different timezone than your machine.

waynelapierre commented 3 years ago

According to my limited experience, most market microstructure studies use the GMT time zone. That means this warning message will probably pop out in most cases. I guess it is not necessary to have this warning message.

emilsjoerup commented 3 years ago

The dataset is in the exchange's timezone. If you want to use GMT, change the timezone of the sample data.

BTW, It's not our warning message.

EVIMAEL commented 3 years ago

1) In: https://github.com/jonathancornelissen/highfrequency/commit/33da46f669337d8ff6a8866cbebc530c03a8cf4a

a) Line 710 defined Sn <- 1/sqrt(const 2 log(n)) is correct? would not be: Sn <- 1/ const( sqrt( 2log(n))) as in Lee and Mykland (2008)?

b) Line 712 defined criticalValue <- Cn + Sn * betastar

Note: Lee and Mykland (2008) define |L(i )|−Cn/ Sn; when the value is > 4.6001 =B , for alpha = 0.01, then we reject the hypothesis of no jump at ti. Thus, calculating B for alpha = 0.05: B* = −log(−log(0.95))= 2.97020

In that sense is the calculation for the critical value in line 712 correct?

Congratulations for the highfrequency project,

Brazilian greetings.

Evimael

EVIMAEL commented 3 years ago

I am performing an event-adapted study for jumps using the LM test where I would like to use 156 past returns (window K) of 15 min which would correspond to aggregate returns at the frequency of 15min for almost 6 days prior to the test day. So I would like to perform the LM test for a single day from market opening. Could you please provide me with some tips on how to proceed to parameterize the LM test of the higfrequency package?

nblbmra commented 3 years ago

(a)

Great catch! Thanks! We removed the constant in our script.

The constant is coming from the definition of Bipower Variation (Barndorff-Nielsen and Shephard, 2004) and is very specific to the way that Lee and Mykland (2008) define their jump test (see their Equation 8). This is the reason that their jump test statistic is normally distributed with mean 0 and variance 1/c^2.

We generalise this jump test statistic to allow for different estimators for the spotvol (not only the Bipower Variation). We make sure that our generalised jump test statistic is standard normally distributed. The constant is already included in our spotvol-function. For example, the pi/2 (which is not included in Eq. 8 of Lee and Mykland) is directly included in our definition of RBPVar. We can drop the constant.

(b)

Your second point relates to the paragraph in Lee & Mykland (2008, pp. 2543). We're talking about two different thresholds. Let's say alpha = 0.01 like in their example. Then: \beta* = −log(−log(0.99))= 4.600149. This is the threshold of the scaled test statistic (by Cn and Sn). We do not rescale our L statistic by Cn and Sn. We rescale the threshold instead. That is, we put Cn and Sn at the other side of the equation. So, for example, if M = 78 and alpha = 0.01, the criticalValue = 4.067058.

nblbmra commented 3 years ago

I am performing an event-adapted study for jumps using the LM test where I would like to use 156 past returns (window K) of 15 min which would correspond to aggregate returns at the frequency of 15min for almost 6 days prior to the test day. So I would like to perform the LM test for a single day from market opening. Could you please provide me with some tips on how to proceed to parameterize the LM test of the higfrequency package?

Exactly. If you would do it like in Lee and Mykland (2008) and you sample the data at a 15-minute frequency, you'll need a burn-in of k = 156 15-minute intervals. In that case, you'll be using data (in the denominator of the test statistic) across a few days.

It depends on your data. How long is your sample period?

EVIMAEL commented 3 years ago

They are multi-events for several companies, that is, i want to analyze if there are jumps over the course of a day from the opening to the closing of the market after the disclosure of information in the overnight period, at a frequency of 15 min. How should I proceed to adjust the LM test parameters?

nblbmra commented 3 years ago

Okay.. Do you have an example of the number of days you have for one stock?

EVIMAEL commented 3 years ago

I collected approx. 1200 dates/time of each information disclosure that was carried out in the overnight period, so I would like to perform the LM test at the frequency of 15 min. from market opening after disclosure until the end of the day. (for that one day). considering Lee and Mykland (2008) it would need a window K= 156 , which would be approximately 6 days prior to disclosure, considering 420 min of continuous daily market operation. Thus, I believe I have enough data to test according to the need that is required by the test. Am I right? how should I make the adjustments to perform the test?

nblbmra commented 3 years ago

Okay that's more than enough!

You're right, you'll need a burn-in and your spot volatility estimate will use data that spans a few days.

The implementation we consider in the package follows Boudt, Croux and Laurent (2011) in J. Empirical Finance. You could also look at Lahaye, Laurent and Neely (2011, J. Applied Econometrics). These papers use a Lee-Mykland statistic without using data across a few days. We calculate the bipower variation across the entire day and then calculate the spot volatility in each interval. Each day is treated in isolation. This technique needs at least 50 days for each stock to calculate the intra-day periodicity.

An example that would generate 15-minute test statistics is the following:

library(highfrequency) sampleOneMinuteData pdata <- sampleOneMinuteData[, list(DT, MARKET)] colnames(pdata) <- c("DT", "PRICE") # It needs a column named "price" LMtest <- intradayJumpTest(pData = pdata, volEstimator = "detPer", # approach Boudt et al. driftEstimator = "none", RM = "rBPCov", # bipower alignBy = "minutes", alignPeriod = 15, # 15-minute jumps marketOpen = "09:30:00", marketClose = "16:00:00", periodicVol = "WSD", alpha = 0.05 )