Closed marketneutral closed 5 years ago
To reduce the storage size and improve compression, we actually store prices as unsigned int32 values. For all fields except for volume we multiply through by 1000 and then round. This is sufficient precision for US equities and futures. You can see that the ctable shows that the fields are u4
:
ctable((25920,), [('open', '<u4'), ('high', '<u4'), ('low', '<u4'), ('close', '<u4'), ('volume', '<u4')])
Integers and unsigned integers do not have a native missing value, so we have reserved 0
to be the missing value for the data. The reader does this conversion here: https://github.com/quantopian/zipline/blob/master/zipline/data/minute_bars.py#L1141. The assumption is that no asset could have a price of 0, but that might not actually be correct here. Was this just to test the ingestion, or do these prices hit 0?
We should probably add a guard in the writing that says that you cannot set these values to 0. We expect users to provide NaN
when it is missing, and we will convert on our own.
tl;dr: price of 0 is translated to NaN by the reader
Thank you @llllllllll ... that was indeed the issue!!! 🤕I was not intentionally writing zeros; I am looking at that now.
tz issue...fixed. Works!!! Thank you.
[20:57:44.587906]: INFO: initialize: Future(1 [TUZ2018])
[20:57:44.601528]: INFO: handle_data: 2018-09-04 06:31:00-04:00
[20:57:44.616197]: INFO: handle_data: 105.625
[20:57:44.616372]: INFO: handle_data: 2018-09-04 06:32:00-04:00
[20:57:44.616570]: INFO: handle_data: 105.625
[20:57:44.616724]: INFO: handle_data: 2018-09-04 06:33:00-04:00
[20:57:44.616909]: INFO: handle_data: 105.625
Can you provide some high level guidance and any "gotchas" you may be aware of for the ingestion of minute bar data?
TL;DR: stepping though ingest, all bcolz steps looks good;
data.current(...)
producesnan
.Input Data
I have .csv data in the form
Ingest Function
I have a working ingest function which registered as
which includes
where
does indeed yield the proper ticker and asset table, inspected by pdb as
No worries that there are zeros, just key that there is data and is it not NaN.
write
thebcolz
tableThe bcolz writer here is getting a valid generator
and the generator yields good data (note that the sid is 1).
write_sid
-->_write_cols
Writing the
bcolz
files here looks good.matches
and the
table
looks goodand the writing completes without error.
Bundle Inspection
The security master looks good:
The
metadata.json
in thezipline_root/data/minute/2018-10-25T20;02;45.580551/minute_equities.bcolz
looks fine:and it looks like there is a table for each sid; a
ls
in thezipline_root/data/minute/2018-10-25T20;02;45.580551/minute_equities.bcolz/00/00
givesAccessing Minute Data in an Algo
Running the bare minimum algo with
produces
So, I am getting
NaN
for all prices, even it seems like, at least,0.0
is all there for every minute in session.Any pointers/guidance at all would be greatly appreciated. Thank you. 😃 📈