Open brettelliot opened 5 years ago
@brettelliot I can take a look at this over the weekend
Thank you!
On Thu, Nov 29, 2018 at 1:47 PM Freddie Vargus notifications@github.com wrote:
@brettelliot https://github.com/brettelliot I can take a look at this over the weekend
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/quantopian/zipline/issues/2376#issuecomment-442947901, or mute the thread https://github.com/notifications/unsubscribe-auth/AJbQ8Qr3i7y6pDegfG7KH8BYRXjd8Y0Mks5u0CvNgaJpZM4Y4a3P .
-- Brett Elliot
Hi @freddiev4 , did you find anything wrong with my data, or the way I was using CSVDIR? Thanks.
Hey @brettelliot sorry I forgot to follow up here.
tl;dr you need to make two separate bundles (e.g. my-iex-csvdir-daily
and my-iex-csvdir-minutely
)
The reason for this is in zipline/data/bundles/csvdir.py
line 134, we're just looping over the tframes
variable, which contains strings saying what type of data you have (minutely and/or daily). We don't create a new metadata table (in SQL) for each of the tframes, so when we try to write the metadata, we get this sqlite3.IntegrityError: UNIQUE constraint failed: equities.sid
or even `sqlite3.IntegrityError: UNIQUE constraint failed: exchanges.exchange.
In SQL a UNIQUE
integrity constraint means that every value in a column or set of columns (key) must be unique, i.e. in one row in the equities
table we have a column called sid
, and the sid for, let's say A.csv
, is mapped to 1.
Before:
equities Table
| | sid | other_data | more_data | symbol |
|---|-----|------------|-----------|--------|
| | 3 | | | AAL |
| | | | | |
| | | | | |
This gets inserted into the table when we first build the daily data.
After
equities Table:
| | sid | other_data | more_data | symbol |
|---|-----|------------|-----------|--------|
| | 3 | | | AAL |
| | 1 | | | A |
| | | | | |
Then, we try to build the minutely data, but we also have a file called A.csv
for minutely data, for which we insert a sid
of 1 into the equities
table; but we already have an entry that looks exactly like that, which is where sqlite throws an exception.
The workaround for this is to make two separate data bundles, one for your daily data and one for your minutely data. A single bundle cannot contain both sets of data.
If any of that is unclear, let me know!
Thanks Freddie,
OK, I can easily make two bundles... but can zipline load two bundles at the same time? I guess the use case is... use the history methods to access both daily and minute data.
Thanks!
On Wed, Dec 5, 2018 at 12:42 PM Freddie Vargus notifications@github.com wrote:
Hey @brettelliot https://github.com/brettelliot sorry I forgot to follow up here.
tl;dr you need to make two separate bundles (e.g. my-iex-csvdir-daily and my-iex-csvdir-minutely
The reason for this is in zipline/data/bundles/csvdir.py line 134, we're just looping over the tframes variable, which contains strings saying what type of data you have (minutely and/or daily). We don't create a new metadata table (in SQL) for each of the tframes, so when we try to write the metadata, we get this sqlite3.IntegrityError: UNIQUE constraint failed: equities.sid or even `sqlite3.IntegrityError: UNIQUE constraint failed: exchanges.exchange.
In SQL a UNIQUE integrity constraint means that every value in a column or set of columns (key) must be unique.
Meaning that in one row in the equities table we have a column called sid, and the sid for, let's say A.csv is 1. This gets inserted into the table when we first build the daily data.
Then, we try to build the minutely data, but we also have a file called A.csv, for which we insert a sid of 1 into the equities table; but we already have an entry that looks exactly like that, which is where sqlite throws an exception.
The workaround for this is to make two separate data bundles, one for your daily data and one for your minutely data. A single bundle cannot contain both sets of data.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/quantopian/zipline/issues/2376#issuecomment-444575723, or mute the thread https://github.com/notifications/unsubscribe-auth/AJbQ8bHHvTN_IRyE_fcdHfiYyc4FRg1Uks5u2AV1gaJpZM4Y4a3P .
-- Brett Elliot
Hi Freddie,
The problem with only loading minute data is that zipline doesn't seem to be able to handle data.history calls when the frequency is "1d". For example this simple call crashes zipline:
daily_history = data.history(context.asset, 'price', bar_count=5, frequency="1d")
Is there a way to load two bundles at once? Is there a way to downsample minute data into daily data for the history function? (I actually downsampled the minute data to generate my daily data so I know it's easy to do. Just not sure how to do it in zipline).
If none of those are possible... I've posted a question to the zipline news group asking how to get data into zipline if I have a data api already (which I do). Perhaps you know how I can skip bundles entirely. If so, please respond either here on two the google group message. I'm really excited to backtest locally but I just can't seem to get my data in!
https://groups.google.com/d/msg/zipline/EkUf095bWs4/tMB7ejTQBgAJ
Thanks, Brett
zipline is dead
Hi Guys, any update on this? how we can have both daily and minute data in back test?
@brettelliot To ingest both daily and minute data, you need to edit zipline/data/bundles/csvdir.py as follows:
for i, tframe in enumerate(tframes):
...
if i == 0:
# Hardcode the exchange to "CSVDIR" for all assets and (elsewhere)
# register "CSVDIR" to resolve to the NYSE calendar, because these
# are all equities and thus can use the NYSE calendar.
metadata['exchange'] = "CSVDIR"
asset_db_writer.write(equities=metadata)
if tframe == 'daily':
divs_splits['divs']['sid'] = divs_splits['divs']['sid'].astype(int)
divs_splits['splits']['sid'] = divs_splits['splits']['sid'].astype(int)
adjustment_writer.write(splits=divs_splits['splits'],
dividends=divs_splits['divs'])
The SQL error is caused by multiple calls to asset_db_writer.write()
when you ingest both daily and minute data. There is no need to register the metadata
twice.
There is also no need to register dividends and splits twice. Since there is another error ingesting dividends from minute data and splits are ignored from minute data, we only need to register them from daily data.
@freddiev4 Please help fix the csvdir bundle.
Thanks for this! I can confirm that this fix allows ingestion and access of both minute and daily data for the same asset in a single bundle.
Dear Zipline Maintainers,
Before I tell you about my issue, let me describe my environment:
Environment
Now that you know a little about me, let me tell you about the issue I am having:
Description of Issue
sqlite3.IntegrityError: UNIQUE constraint failed: equities.sid
Here is how you can reproduce this issue on your machine:
Reproduction Steps
CSVDIR=./iex/ zipline ingest -b my-iex-csvdir
...What steps have you taken to resolve this already?
I can successfully ingest the daily data by it self with this extension.py:
register('my-iex-csvdir', csvdir.csvdir_equities(['daily']))
and I can successfully ingest the minute data by it self with this extension.py:
register('my-iex-csvdir', csvdir.csvdir_equities(['minute']))
But those changes don't fix anything. They just prove my data can be ingested individually. There seems to be a problem with the csvdir bundle when trying to ingest daily and minute data OR theres a problem with my data, process, extension.py, etc that I just don't see. ...
Anything else?
Thanks for the support!
Here's the whole call stack:
...
Sincerely,
$ whoami