Open josephcclin opened 5 years ago
As I changed the trading-calendar to 'NYSE', the issue was solved. However, I am still puzzled because basically the times of data (in minute-frequency) are contained in the range of CME somehow.
Hi, got the same issue ingesting minute data for 24/7 calendar
can you try the following to configure minutes_per_day, which default to 390 (for stocks.) minutes_per_day=1440, calendar_name='CME', start_session=None, end_session=None
I'm not confident this is the best fix but it seemed like it had something to do with the calculation of the last possible index in the range. I fixed it in my installation by changing the following line in:
zipline/data/minute_bars.py
from:
latest_min_count = all_minutes.get_loc(last_minute_to_write)
to
latest_min_count = all_minutes.get_loc(last_minute_to_write, 'backfill')
to cause it to find the value at the next possible minute after the minute it's looking for, if the minute its looking for is not found.
Similar issue:
`Traceback (most recent call last): File "pandas/_libs/tslib.pyx", line 1702, in pandas._libs.tslib.convert_str_to_tsobject File "pandas/_libs/src/datetime.pxd", line 119, in datetime._string_to_dts ValueError: Error parsing datetime string "ASTC.csv" at position 0
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "pandas/_libs/tslib.pyx", line 1732, in pandas._libs.tslib.convert_str_to_tsobject File "pandas/_libs/tslibs/parsing.pyx", line 99, in pandas._libs.tslibs.parsing.parse_datetime_string File "/home/x777/anaconda3/envs/env_zipline/lib/python3.5/site-packages/dateutil/parser/_parser.py", line 1374, in parse return DEFAULTPARSER.parse(timestr, **kwargs) File "/home/x777/anaconda3/envs/env_zipline/lib/python3.5/site-packages/dateutil/parser/_parser.py", line 649, in parse raise ParserError("Unknown string format: %s", timestr) dateutil.parser._parser.ParserError: Unknown string format: ASTC.csv
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/x777/anaconda3/envs/env_zipline/bin/zipline", line 11, in
File format (example part):
date | open | high | low | close | volume 2020-03-25 08:00 | 1.2 | 1.88 | 1.2 | 1.88 | 1229 2020-03-25 08:01 | 2.25 | 2.25 | 2.25 | 2.25 | 2 2020-03-25 08:03 | 2.25 | 2.25 | 2.25 | 2.25 | 198 2020-03-25 08:04 | 2 | 2.3899 | 2 | 2.32 | 964 2020-03-25 08:05 | 2.25 | 2.61 | 2.25 | 2.5 | 3997 2020-03-25 08:06 | 2.4499 | 2.48 | 2.44 | 2.48 | 1100 2020-03-25 08:07 | 2.48 | 2.4899 | 2.44 | 2.44 | 1727 2020-03-25 08:09 | 2.3 | 2.4012 | 2.3 | 2.39 | 1520 2020-03-25 08:10 | 2.38 | 2.38 | 2.1 | 2.1 | 1121 2020-03-25 08:11 | 2.05 | 2.1 | 1.78 | 1.78 | 2217 2020-03-25 08:12 | 1.78 | 1.88 | 1.7 | 1.88 | 1408 2020-03-25 08:13 | 1.88 | 2.03 | 1.88 | 2 | 2657 2020-03-25 08:14 | 2.03 | 2.34 | 2.03 | 2.34 | 8467 2020-03-25 08:15 | 2.34 | 2.47 | 2.27 | 2.47 | 12406 2020-03-25 08:16 | 2.4 | 2.55 | 2.21 | 2.27 | 8549 2020-03-25 08:17 | 2.3 | 2.7 | 2.27 | 2.7 | 22131 2020-03-25 08:18 | 2.75 | 2.9 | 2.73 | 2.76 | 26921 2020-03-25 08:19 | 2.76 | 3.1 | 2.65 | 3.01 | 17288 2020-03-25 08:20 | 3.09 | 3.19 | 2.86 | 3.19 | 31333 2020-03-25 08:21 | 3.15 | 3.3 | 3.02 | 3.11 | 39337 2020-03-25 08:22 | 3.06 | 3.09 | 2.87 | 2.89 | 40277 2020-03-25 08:23 | 2.895 | 3.02 | 2.79 | 2.9 | 15370 2020-03-25 08:24 | 2.9 | 3.27 | 2.9 | 3.14 | 22064 2020-03-25 08:25 | 3.16 | 3.16 | 2.91 | 3 | 16245 2020-03-25 08:26 | 2.9999 | 3.08 | 2.9999 | 3 | 8341
I'm not confident this is the best fix but it seemed like it had something to do with the calculation of the last possible index in the range. I fixed it in my installation by changing the following line in:
zipline/data/minute_bars.py
from:
latest_min_count = all_minutes.get_loc(last_minute_to_write)
to
latest_min_count = all_minutes.get_loc(last_minute_to_write, 'backfill')
to cause it to find the value at the next possible minute after the minute it's looking for, if the minute its looking for is not found.
@netshade , just be carrefull, some strategies might be affected. If you make a BUY on market_open(), you might get the price from the previous day not the open price of the current day!
Great call, thank you. :)
Oi @lobobruno , tudo bom? Do you have a working example of a ingest function for minute level data that you'd be willing to share? I've been trying to run minute-level backtests with some issues. I've got it to work now but my output has a strange quality. Even though I have minute level data:
2020-05-08 09:44:00+00:00 2020-05-08 09:45:00+00:00 2020-05-08 09:46:00+00:00
My output zeros out everything but the day, tossing the hour and minute detail out. So, for a given trading day, I've got a series of +400 lines of results that all share the same timestamp (that day's date). Is this an issue that you encountered? What part of this process could lead to this? Many thanks for your insight Output:
2020-05-08 00:00:00+00:00 2020-05-08 00:00:00+00:00 2020-05-08 00:00:00+00:00
Hi guys!
I solved my problem by setting minutes_per_day to its correct value while I'm registering my bundle to ingest, in register function. So to fix the problem you should prepare your bundle registration function with correct TradingCalender AND minutes_per_day value.
For example if you want a 24/7 hour calender you register function should be like this:
register_calendar(
'always_open',
AlwaysOpenCalendar(),
)
register(
'test_bundle',
csvdir_equities(
['minute'],
'path_to_your_csv_file',
),
calendar_name='always_open',
minutes_per_day=1440,
start_session=start_session,
end_session=end_session
)
Check this files to see what is happening: _zipline/data/bundles/core.py line 408 zipline/data/minute_bars.py line 468 zipline/data/minutebars.py line 810
@h4ppysmile, would you know how to solve #2700 ?
I have the same Issue with the Timestamp error, when I want to use the always_open market. However, I have the error on a daily timeframe and tried several solutions like
start_date = pd.Timestamp('2022-07-29',).tz_localize('UTC')
start_date = pd.Timestamp('2022-07-29')
It seems that this problem occurs and that the data is 20 years before the day I ingested the data bundle via zipline.
KeyError Traceback (most recent call last) File pandas_libs\index.pyx:444, in pandas._libs.index.DatetimeEngine.get_loc()
File pandas_libs\hashtable_class_helper.pxi:1625, in pandas._libs.hashtable.Int64HashTable.get_item()
File pandas_libs\hashtable_class_helper.pxi:1632, in pandas._libs.hashtable.Int64HashTable.get_item()
KeyError: 1061596800000000000
During handling of the above exception, another exception occurred:
KeyError Traceback (most recent call last) File c:\Users\henry\miniconda3\envs\ml4t\lib\site-packages\pandas\core\indexes\base.py:3081, in Index.get_loc(self, key, method, tolerance) 3080 try: -> 3081 return self._engine.get_loc(casted_key) 3082 except KeyError as err:
File pandas_libs\index.pyx:413, in pandas._libs.index.DatetimeEngine.get_loc()
File pandas_libs\index.pyx:446, in pandas._libs.index.DatetimeEngine.get_loc()
KeyError: Timestamp('2003-08-23 00:00:00+0000', tz='UTC')
The above exception was the direct cause of the following exception: ... 686 return Index.get_loc(self, key, method, tolerance) 687 except KeyError as err: --> 688 raise KeyError(orig_key) from err
KeyError: Timestamp('2003-08-23 00:00:00+0000', tz='UTC')
This is probably due to a long-standing hard-coded limit in exchange_calendars.
This might manual patch works around the issue (well, at least back to 1970): https://pypi.org/project/zipline-norgatedata/#patch-to-allow-backtesting-before-20-years-ago
Zipline itself in calendar_utils is also hardcoded to 1990. See this patch too: https://pypi.org/project/zipline-norgatedata/#additional-patch-to-allow-backtesting-before-1990
Dear Zipline Maintainers,
Before I tell you about my issue, let me describe my environment:
Environment
Now that you know a little about me, let me tell you about the issue I am having:
Description of Issue
Here is how you can reproduce this issue on your machine:
Reproduction Steps
from zipline.data.bundles import register from zipline.data.bundles.csvdir import csvdir_equities
start_session = pd.Timestamp('2009-08-24', tz='UTC') end_session = pd.Timestamp('2010-08-24', tz='UTC')
register( 'futures-bundle-min', csvdir_equities( ['minute'], 'C:\Users\user\zipTest', ), calendar_name='CME', start_session=start_session, end_session=end_session )