robcarver17 / pysystemtrade

Systematic Trading in python
GNU General Public License v3.0
2.62k stars 825 forks source link

Collision between end-of-day and intra-day data #381

Closed mjuhanne closed 3 years ago

mjuhanne commented 3 years ago

Hi,

I noticed that you save both EOD and ID data on the same MongoDB table, but the former price with special timestamp '23:00:'00'. Maybe in your timezone this hasn't yet induced problems but in my (UTC+3) there are collisions because the intra-day timestamp is converted to computers local time zone.

For example AUD_20210900:

...
2021-07-14 23:00:00  0.74500  0.74530  0.74450  0.74475   3128.0
2021-07-15 01:00:00  0.74795  0.74815  0.74760  0.74815   1932.0
2021-07-15 02:00:00  0.74810  0.74860  0.74805  0.74835   1073.0
2021-07-15 19:30:00  0.74405  0.74475  0.74380  0.74440   2389.0
2021-07-15 20:00:00  0.74440  0.74515  0.74350  0.74385   4854.0
2021-07-15 21:00:00  0.74390  0.74450  0.74350  0.74365   3141.0
2021-07-15 22:00:00  0.74365  0.74380  0.74245  0.74260   4966.0
2021-07-15 23:00:00  0.74780  0.74880  0.74770  0.74785   2402.0
2021-07-16 01:00:00  0.74190  0.74240  0.74180  0.74230   2785.0
2021-07-16 02:00:00  0.74230  0.74255  0.74200  0.74255    886.0
2021-07-16 19:30:00  0.74220  0.74250  0.74035  0.74140   3925.0 <-- 08:30
2021-07-16 20:00:00  0.74145  0.74165  0.73995  0.74090   8230.0 <-- 09:00
2021-07-16 21:00:00  0.74095  0.74140  0.74030  0.74105   5092.0 <-- 10:00
2021-07-16 22:00:00  0.74110  0.74180  0.74105  0.74165   1969.0 <-- 11:00
2021-07-16 23:00:00  0.74145  0.74215  0.74135  0.74185   3524.0
2021-07-17 01:00:00  0.74020  0.74025  0.73940  0.73970   2595.0 <-- 14:00
2021-07-17 02:00:00  0.73970  0.73995  0.73945  0.73945   1360.0 <-- 15:00

These all seem to be intra-day values, but there's something funky going on. 1) The 2021-07-16 23:00:00 does not seem to be EOD value (I haven't found similar value anywhere).
2) 13:00 price data is missing 3) Even the timestamp conversion doesn't seem to be correct, because exchange time is UTC-6 and 08:30 should be converted then to 17:30 (UTC+3) and not 19:30! I double-checked that my TZ is indeed set to EEST (UTC+3).

I think I need to scrap the Arctic tables and start over in case there's been some corruption along my experimentations (among others a script that converts from multiple prices back to individual contract prices so that provided historical data can be appended with fresh data from for example Barchart or IB).

Anyhow, regardless of the timestamp conversion discrepancy, there would still be collision (exchange 14:00 to local 23:00) with EOD/intraday timestamps, so could I suggest that the EOD 'special' timestamp be changed to '23:59:59' since there won't be any price data with frequency of a second? Maybe more robust thing would be to separate EOD/intraday to separate Arctic tables, but I think that would need more extensive rework. Also, IMHO wouldn't it be more 'standard way' to save data to MongoDB with UTC+0 and then do the local TZ conversion if data needs to be displayed? (Now there's a incoherency on the timestamps when there's a change between standard and daylight saving time)

BTW, what is the use of intra-day data in pysystemtrade anyway?

Secondly, theres another weird discrepancy of FINAL values. For example again AUD_20210900: From TWS using 1 Month/Daily candles chart:

Date H/L/C 
2021-07-14 0.74885 0.74670 0.74795
2021-07-15 0.74515 0.74120 0.74190
2021-07-16 0.74250 0.73940 0.74020

From seed_from_IB but saved to CSV instead of Arctic:

DATETIME,OPEN,HIGH,LOW,FINAL,VOLUME
....
2021-07-14 23:00:00,0.7486,0.74885,0.7467,0.74835,26588
2021-07-15 23:00:00,0.74405,0.74515,0.7412,0.74255,26663
2021-07-16 23:00:00,0.7422,0.7425,0.7394,0.73945,27990

The HIGH/LOW values match, but FINAL values don't! Do you have any idea why is this? Does either TWS or IB API incorporate after-hours prices to final values or something?

robcarver17 commented 3 years ago

"Maybe more robust thing would be to separate EOD/intraday to separate Arctic tables,"

I think ultimately this is the best solution, albeit the most painful. I'd need to duplicate all the pricing tables for both intraday and daily (individual contracts, multiple and adjusted prices). I need to write code to extract the intraday data from existing tables (which will be easy for me, but perhaps a pain for you?).

I've timetabled this for September, as I'm a bit busy until then.

"BTW, what is the use of intra-day data in pysystemtrade anyway?"

I plan to implement some intra-day systems in the near future.

"The HIGH/LOW values match, but FINAL values don't! Do you have any idea why is this?"

Sorry no idea.

mjuhanne commented 3 years ago

Thank you for your reply. It's not a critical issue and I can work my way around it.

Anyway, the price variations are so small so in practice they won't effect on the forecasting values.

I'm looking forward to seeing your intra-day related stuff :)

mjuhanne commented 3 years ago

I found the reason for the weird intra-day timestamps:
ib_price_client.py : _ib_timestamp_to_datetime() converts the timestamp from IB API to local time. The function assumes that the timestamp is in UTC+0. However _ib_get_historical_data_of_duration_and_barSize calls self.ib.reqHistoricalData with parameter formatDate=1 which returns data that is already converted to the local time used by TWS/gateway.

So now there was a double conversion. Maybe this didn't cause any problems for you because you are already in UTC+0?

Also adjust_timestamp_to_include_notional_close_and_time_offset() mangled the timestamp even more (00:00:00 gets converted to 23:00:00) so that's why there wasn't price data for the faux "midnight"

From documentation:

           formatDate: For an intraday request setting to 2 will cause
                the returned date fields to be timezone-aware
                datetime.datetime with UTC timezone, instead of local timezone
                as used by TWS. 

I tested with formatDate=2 (though needed 1 additional modification to _adjust_ib_time_to_local because the timestamp already has TZ). Now it returns data as UTC+0 and there's no actual overlap with AUD or other US/Eastern exchange futures anymore. UTC-5 closing time 15:00 gets converted to 23:00 (UTC+3), so for my timezone there's no problem (of course because it's the last hour bar that gets saved and not the daily bar, the open/high/low values will differ from actual daily values but those aren't used by the code anyway yet).