joshuaulrich / quantmod

Quantitative Financial Modelling Framework
http://www.quantmod.com/
GNU General Public License v3.0
809 stars 223 forks source link

Read intraday data with getSymbols.csv #52

Open joshuaulrich opened 9 years ago

joshuaulrich commented 9 years ago

Currently getSymbols.csv only handles date-based indexes (it calls as.Date on the first column in the csv file). It would be nice to be able to read intraday data from CSV.

The first step would be to allow the user to call a function other than as.Date on the first column of the CSV. Then it might be nice to support CSV layouts where the date and time are in different columns (e.g. this question stackoverflow).

Note that read.zoo might provide some useful insight how to handle both of these cases.

MislavSag commented 3 years ago

I have the same problem with MySQL. If use intraday data, getSymbols converts datetime object to date object. I saw the problem is in this line:

            fr <- xts(as.matrix(fr[,-1]),
                      order.by=as.Date(fr[,1],origin='1970-01-01'),
                      src=dbname,updated=Sys.time())

Probbably it should be as.POSIXct.

braverock commented 3 years ago

note that getSymbols.FI in the FinancialInstrument package handles intraday data. This is what it was designed to do. It integrates normally as a getSymbols method.

braverock commented 3 years ago

I don't think changing all the other as.Date -only getSymbols methods makes a lot of sense. Intraday data requires different storage and indexing than daily data, and it is certainly easier for users to figure out how to format thier input data for daily in the unlikely event that they don't have a compatible format readily available. All intraday data sources will require more work to parse and store for later use. FinancialInstrument already has an extensive collection of parsers designed to demonstrate exactly how that should be done to make things work smoothly with R.

MislavSag commented 3 years ago

@braverock I have the getSymbol function from FinancialInstrument package. It doesn't work with my MySQL specification:

directory  /AAPL  does not exist, skipping
done.
NULL
Warning message:
In FinancialInstrument::getSymbols.FI("AAPL", src = "MySQL", return.class = "xts",  :
  No data found.

I think I will write my own function in the end.

braverock commented 3 years ago

@braverock I have the getSymbol function from FinancialInstrument package. It doesn't work with my MySQL specification:

directory  /AAPL  does not exist, skipping
done.
NULL
Warning message:
In FinancialInstrument::getSymbols.FI("AAPL", src = "MySQL", return.class = "xts",  :
  No data found.

I think I will write my own function in the end.

getSymbols.FI uses RData or rda files, not MySQL. for tick data, this is far more efficient both in disk storage and in read times. That's why we don't use a database for L1 tick data (believe me, we've tried all of them over decades).

This ticket was about getSymbols.csv. In our experience, you don't want to parse intraday data from CSV every time you need it. That's why we settled on storing in R's native binary data format as xts objects (HDF5 would also be a reasonable solution).

MislavSag commented 3 years ago

You have much more experience than I do. I have following approach for market data:

  1. download minute data from interactive brokers for all available history. Save file locally in HDF5 and to MySQL database.
  2. When I call import_SQL function in R it: a) imports data from MySQL if the data doesn't exists locally . Save data locally as fst file which allows very fast imports later. b) import file form local cached fts file (very fast)
  3. do other calculations after imports (only market hours, price adjustments etc.).

I was thinking about saving everything to flat files (to M-files), but I though SQL will be better in the long run.

I run backtest in the quantconnect in the end, R is just for research. In general R lacks very robust backtest platforms (like backtrader or quantconnect).

joshuaulrich commented 3 years ago

It would be easier to make getSymbols.MySQL() handle date-time columns the MySQL driver should return Date or POSIXct depending on the type of the column in the table.

So the order.by=as.Date(fr[,1],origin='1970-01-01') line could probably be order.by = fr[,1], assuming the first column is a date or datetime in the data.frame. That said, I don't have any way to test this quickly, so someone would have to help.

braverock commented 3 years ago

It would be easier to make getSymbols.MySQL() handle date-time columns the MySQL driver should return Date or POSIXct depending on the type of the column in the table.

So the order.by=as.Date(fr[,1],origin='1970-01-01') line could probably be order.by = fr[,1], assuming the first column is a date or datetime in the data.frame. That said, I don't have any way to test this quickly, so someone would have to help.

The tickets was originally about getSymbols.csv, not getSymbols.MySQL, but I agree that the MySQL method should be able to pick up the class of the index column from the database.