matthewgilbert / strategy

Simulations for Futures and Equities
MIT License
8 stars 4 forks source link

Implementing from_hdf5 for Exposures #2

Open matthewgilbert opened 6 years ago

matthewgilbert commented 6 years ago

It would be nice to implement a from_hdf5 method similar to the from_folder method that currently exists. In addition to adding this, it would make sense to add a util.py file with a function of the form folder_to_hdf5(path_to_prices, path_to_contract_dates, path_to_meta_data) for converting a folder structure with the appropriate meta data, price and expiry files into an hdf5 file in the appropriate format for from_hdf5 to read from.

One possibility is just to encapsulate the price data in the hdf5 file, since this is the largest and slowest data to read.

In [1]: import pandas as pd
   ...: from strategy.strategy import Exposures
   ...: instr_types = pd.Series(["equity", "future", "future"], index=["XIV", "ES", "TY"])

In [2]: %timeit Exposures.parse_folder('tests/marketdata', instr_types)
1 loop, best of 3: 724 ms per loop

In [3]: %timeit Exposures.read_expiries('tests/marketdata/contract_dates.csv', set(["ES", "TY"]))
1000 loops, best of 3: 1.9 ms per loop

In [4]: %timeit Exposures.parse_meta('tests/marketdata/instrument_meta.json')
1000 loops, best of 3: 708 µs per loop

The benefit of also including expiry data and instrument meta data in this file is it simplifies the API from something like

from_hdf5(meta_data_file, expiry_file, price_hdf5)

to simply

from_hdf5(hdf5_data)

matthewgilbert commented 6 years ago

More generally, it it likely a good idea to separate the on disk reading of data into a helper module. This reduces class bloat and provides a cleaner interface for adding new on disk data types as they arise in the future.