matthewgilbert / pdblp

pandas wrapper for Bloomberg Open API
MIT License
242 stars 67 forks source link

Wrong order of the tickers in the request answers #17

Closed jcrichard closed 6 years ago

jcrichard commented 6 years ago

Hi

This is the example I use to show the problem in my case. The order of the column in DF is not the same than the order of the ticker.

I think it is a big issue when it happens. I do not know from where it comes.

StartYear = 2015
s = "USSWAP"
ticker = [s+ str(x+1)+" Comdty" for x in range(10)]
DF =  con.bdh(ticker, "PX_LAST", datetime.datetime(StartYear, 1,1).strftime("%Y%m%d"), datetime.datetime.now().strftime("%Y%m%d")) 
matthewgilbert commented 6 years ago

I don't currently have access to a Bloomberg connection so cannot reproduce. Could you please set con.debug = True and add this output to your post as well as the output from Df.head()

NoahKauffman commented 6 years ago

Basically what is happening is the following ... his ticker is as follows: s = "USSWAP" ticker = [s+ str(x+1)+" Comdty" for x in range(10)]

When the DF is returned it is returned with the columns in the following order: reorder = ticker reorder.sort() print(reorder)

so rather than have ticker1 , ticker 2, ticker 3 ... it is sorted as ticker 1, ticker 10, ticker 2 ... etc.

I'm not sure of the syntax to reorder the multi-index dataframe, but I think there should be a simple way to just resort the index in the original order.

NoahKauffman commented 6 years ago

OK - it's actually very simple to resort back to the original order ... just do the following:

con = pdblp.BCon(debug=False, port=8194) con.start()

StartYear = 2015 s = "USSWAP" ticker = [s+ str(x+1)+" Comdty" for x in range(10)] DF = con.bdh(ticker, "PX_LAST", datetime.datetime(StartYear, 1,1).strftime("%Y%m%d"), datetime.datetime.now().strftime("%Y%m%d"))

resort back to orig order

DF = DF[ticker] DF.head()

jcrichard commented 6 years ago

'

ticker USSWAP1 Comdty USSWAP9 Comdty USSWAP2 Comdty USSWAP10 Comdty USSWAP3 Comdty USSWAP4 Comdty USSWAP5 Comdty USSWAP6 Comdty USSWAP7 Comdty USSWAP8 Comdty
field PX_LAST PX_LAST PX_LAST PX_LAST PX_LAST PX_LAST PX_LAST PX_LAST PX_LAST PX_LAST
date
2015-01-01 0.4430 2.2180 0.9000 2.2825 1.2990 1.5775 1.7715 1.9240 2.0430 2.1390
2015-01-02 0.4540 2.1430 0.8920 2.2240 1.2780 1.5458 1.7343 1.8790 1.9945 2.0840
2015-01-05 0.4480 2.0834 0.8855 2.1424 1.2543 1.5115 1.6933 1.8272 1.9298 2.0142
2015-01-06 0.4330 1.9868 0.8520 2.0455 1.1955 1.4379 1.6071 1.7358 1.8357 1.9180
2015-01-07 0.4318 2.0218 0.8425 2.0838 1.1902 1.4390 1.6187 1.7538 1.8619 1.9488
'

DEBUG:root:Sending Request: HistoricalDataRequest = { securities[] = { "USSWAP1 Comdty", "USSWAP2 Comdty", "USSWAP3 Comdty", "USSWAP4 Comdty", "USSWAP5 Comdty", "USSWAP6 Comdty", "USSWAP7 Comdty", "USSWAP8 Comdty", "USSWAP9 Comdty", "USSWAP10 Comdty" } fields[] = { "PX_LAST" } startDate = "20150101" endDate = "20170913" overrides[] = { } }

jcrichard commented 6 years ago

Thank you. Yes this is what I have done but it should be done in the function because it can yield a lot of mistakes.

matthewgilbert commented 6 years ago

As @NoahKauffman pointed out, the issue is just in the order of how the results are returned. If you look at _bdh_list() (which bdh() calls), this is because the response is just parsed in the order it is returned from Bloomberg, without any reordering.

The documentation for bdh should be updated, since as it currently stands the documentation is wrong, it was not appropriately updated after 94d38ffdc2b0860ff328dc0fa5f924239e7818bd

I am confused when you mention

Yes this is what I have done but it should be done in the function because it can yield a lot of mistakes.

the data as it is returned is correct for each column, just the order of the columns is rearranged. What mistakes are you referring to?

matthewgilbert commented 6 years ago

The docstrings for bdh() have been fixed to be representative of the new behaviour in 18fa2a2f37218d935d9309ecd4fce1a0479f6d32

jcrichard commented 6 years ago

the data as it is returned is correct for each column, just the order of the columns is rearranged. What mistakes are you referring to?

I mean that you expect that the data come back in the same order than you ask. Let says you are working with matrix then the column order is very important. In my case I had some wrong results because I was thinking that the order was the same (which is the logical way in my opinion).

matthewgilbert commented 6 years ago

I agree this is somewhat unintuitve. If I recall correctly that rational for this was because pandas.MultiIndex needs to be lexsorted. For example, if you were to request

tickers = ['Stock2', 'Stock1']
fields = ['VOLUME', 'PX_LAST']

this causes issues with the associated DataFrame returned

mcols = pd.MultiIndex.from_product([tickers, fields])
idx = pd.date_range("2015-01-01", "2015-01-02")
data = [[100, 100000, 20, 353], [101, 5786, 19, 501]]
df = pd.DataFrame(data, index=idx, columns=mcols)
df.loc[:, tickers]

UnsortedIndexError: 'MultiIndex Slicing requires the index to be fully lexsorted tuple len (1), lexsort depth (0)'

For now I would recommend following @NoahKauffman 's suggestion if you are planning to reference the underlying numpy.ndarray, e.g. DF.loc[:, tickers].values

matthewgilbert commented 6 years ago

Due to the constraints imposed by pandas.MultiIndex this will not be changed so I'm marking this as closed.