jonathancornelissen / highfrequency

The highfrequency package contains an extensive toolkit for the use of highfrequency financial data in R. It contains functionality to manage, clean and match highfrequency trades and quotes data. Furthermore, it enables users to: calculate easily various liquidity measures, estimate and forecast volatility, and investigate microstructure noise and intraday periodicity.
147 stars 63 forks source link

Performance issue with exchangeHoursOnly and alternative #94

Closed MislavSag closed 1 year ago

MislavSag commented 1 year ago

Hi,

I have downloaded quotes data for SPY from Dukascopy for the test.

The first steps according to SSRN paper include noZeroQuotes and exchangeHoursOnly functions.

While noZeroQuotes is fast enough, exchangeHoursOnly is pretty slow. I have 119.304.432 rows. This is quotes data for SPY from 2017. It takes more than 15 minutes (I terminated the operation after cca 15 min).

I tried than simple filtering of exchange times using data.table and nanotime packages:

dates_ <- as.nanotime(unique(nano_floor(quotes$DT, nanoperiod(days = 1), tz = "UTC")))
dates_start <- dates_ + as.nanoduration("19:30:00")
dates_end <- dates_ + as.nanoduration("20:00:00")
intervlas_ <- nanoival(dates_start, dates_end)
system.time(quotes[DT %in% intervlas_])

where quotes have the same structure as sampleQDataRaw, with only difference that DT is nanotime object and not Posxcit object.

This takes cca 2.5 seconds.

My question is, can I use this simple function or your exchangeHoursOnly function do additional filtering ?

Important note is that nanotime requires UTC time.

onnokleen commented 1 year ago

That's indeed unfortunate. Does the problem still exist? We use the data.table::between function and, otherwise, data.table functions perform really well. There is nothing special about this function so your workaround with nanotime is fine.

onnokleen commented 1 year ago

I can't replicate this issue with ~1.2 million rows. These take only 260ms on my computer. Hence, I propose to close this issue. @kboudt Can you close it?

MislavSag commented 1 year ago

I will try in few days again with some new data I have, but think this can be closed for now. I will reopen if will have similar problem