matthewgilbert / pdblp

pandas wrapper for Bloomberg Open API
MIT License
242 stars 67 forks source link

BDH with multiple start and end dates #20

Closed jxg852 closed 6 years ago

jxg852 commented 6 years ago

I'm looking to pull a time series of prices for a vector of securities, but I would like to specify different start and end dates for each security. Is that possible with BDH?

I tried something like this but obviously it didn't work - con.bdh(['VIX Index', 'UX1 Index'], ['PX_LAST'], ['20170101', '20170201'],['20170831','20171031'])

matthewgilbert commented 6 years ago

No this is not possible, you should make multiple calls for each date range. The bdh() function wraps the underlying HistoricalDataRequest service which does not support multiple dates per request, as discussed in Section 13.1 of the Developer's Guide

The HistoricalDataRequest Request object enables the retrieval of end-of-day
data for a set of securities  and fields over a specified period, which can be
set to daily, monthly, quarterly, semiannually or annually. At least one
security and one field are required along with start and end dates.
jxg852 commented 6 years ago

In that case, do you have any advice for working with large data sets - e.g. pulling a week's worth of prices for thousands of securities over the course of 5 years? Currently, I'm simply pulling daily prices for all the securities for 5 years, but that results in a data frame with 1 million plus rows. Is there a better/faster way to do that?

matthewgilbert commented 6 years ago

Any functionality that allowed this would ultimately fall back on making iterative HistoricalDataRequest calls. One way to do this would just be to group your securities into subsets for each of the period and loop through making iterative calls to bdh yourself, and merging the results afterwards.

Alternatively, if data limits are what you are concerned about (although I'm not sure if restricting time horizon of requests affects Bloomberg Data limits) you could just make the bulk request as you are doing but look at doing some local caching. As discussed in the tutorial, something like this

import joblib
import shutil
from tempfile import mkdtemp
temp_dir = mkdtemp()
cacher = joblib.Memory(temp_dir)
bdh = cacher.cache(con.bdh, ignore=['self'])
bdh('SPY US Equity', 'PX_LAST', '20150629', '20150630')

Caching like this will be convenient if you want to be able to rerun some analysis without throttling your Bloomberg connection, it could also provide some speedups for repeated calls.

jxg852 commented 6 years ago

great thanks!