matthewgilbert / pdblp

pandas wrapper for Bloomberg Open API
MIT License
241 stars 67 forks source link

Return numpy nan value if does not exist #6

Closed ryanshrott closed 7 years ago

ryanshrott commented 7 years ago

I usually get this error; ValueError: Length mismatch: Expected axis has 0 elements, new values have 3 elements

when the value does not exist. Why not just return np.nan?

matthewgilbert commented 7 years ago

Could you post a reproducible example with the debug flag set to True?

ryanshrott commented 7 years ago

import pdblp

con = pdblp.BCon(debug=True) # Instantiate the bloomberg connection object con.start() # Start connection to bbg api con.ref('US0378331005 ISIN', 'CRNCY')

ryanshrott commented 7 years ago

In excel, I could write: =BDP("US0378331005 ISIN", "CRNCY") And the result would be 'USD'

tschm commented 7 years ago

This is easy to fix:

    def ref(self, tickers, flds, ovrds=[]):
        """
        Make a reference data request, get tickers and fields, return long
        pandas Dataframe with columns [ticker, field, value]

        Parameters
        ----------
        tickers: {list, string}
            String or list of strings corresponding to tickers
        flds: {list, string}
            String or list of strings corresponding to FLDS
        ovrds: list of tuples
            List of tuples where each tuple corresponds to the override
            field and value
        """
        return DataFrame(data=self._ref(tickers, flds, ovrds), columns=["ticker", "field", "value"])

Matthew is assuming that self.__ref returns valid data with three columns which is often not the case. You can try with any fantasy symbol! The code above should replace the original ref function.

matthewgilbert commented 7 years ago

I don't currently have a Bloomberg connection, but just to be clear the idea behind the above @tschm is that ref would return an empty DataFrame when invalid tickers are requested? Does the Bloomberg ReferenceDataResponse include a ResponseError or just empty SecurityData? If the former, it is unclear to me that instantiating an empty DataFrame is the best solution, however failing in a more transparent way might be better as well. Thoughts?

Robuko commented 7 years ago

I also need to return an empty DataFrame when the fieldData is blank. tschm's solution is not working for me eg security = "003240 KS Equity", fld_list = ['BEST_EPS', 'BEST_PX_SALES_RATIO']

my solution is to return an empty DataFrame this way eg for bdh:

def bdh(self, tickers, flds, start_date, end_date, elms=[],
        ovrds=[], longdata=False):

    elms = list(elms)

    data = self._bdh_list(tickers, flds, start_date, end_date,
                          elms, ovrds)

    df = DataFrame(data)
    try:
        df.columns = ["date", "ticker", "field", "value"]
        df.loc[:, "date"] = pd.to_datetime(df.loc[:, "date"])
        if not longdata:
            cols = ['ticker', 'field']
            df = df.set_index(['date'] + cols).unstack(cols)
            df.columns = df.columns.droplevel(0)
    except Exception:
        df = DataFrame()

    return df
matthewgilbert commented 7 years ago

Inspecting the above example I see the following output

import pdblp
con = pdblp.BCon(debug=True)
con.start()
con.ref('US0378331005 ISIN', 'CRNCY')
DEBUG:root:Sending Request:
 ReferenceDataRequest = {
    securities[] = {
        "US0378331005 ISIN"
    }
    fields[] = {
        "CRNCY"
    }
    overrides[] = {
    }
}

DEBUG:root:Message Received:
 ReferenceDataResponse = {
    securityData[] = {
        securityData = {
            security = "US0378331005 ISIN"
            eidData[] = {
            }
            securityError = {
                source = "236::bbdbl8"
                code = 15
                category = "BAD_SEC"
                message = "Unknown/Invalid security [nid:236] "
                subcategory = "INVALID_SECURITY"
            }
            fieldExceptions[] = {
            }
            sequenceNumber = 0
            fieldData = {
            }
        }
    }
}

I don't think silently ignoring this and instantiating an empty DataFrame is the best default behaviour, what would be the use cases of this? As mentioned above, BCon._ref returns an empty list when there are errors, so one hack around if you want to ignore these types of errors would be to instantiate DataFrames from _ref() calls.

Robuko commented 7 years ago

The call to ref is expecting a dataframe returned, so a list return creates unnecessary handling complications. The use case I have is iterating through a valid list of stocks with a list of fields (both ref and bdh calls), of which not all data fields would be populated for every stock. I want the unpopulated fields to be empty fields in the df that is created, so an empty df is the correct response type for me.

matthewgilbert commented 7 years ago

I am going to close this since 659c0d2a8af7f8d24cf53adeb5f3790774160cdc partially addresses this.

Fields which are NOT_APPLICABLE_TO_REF_DATA for a specific security are returned as NaN

>>> tickers = ["EJ8189684 Corp", "EI862261 Corp", "EJ319090 Corp"]
   ...: fields = ["MATURITY", "NXT_CALL_DT"]
   ...: df = con.ref(tickers, fields)
df
           ticker        field       value
0  EJ8189684 Corp     MATURITY  2025-09-18
1  EJ8189684 Corp  NXT_CALL_DT  2020-09-18
2   EI862261 Corp     MATURITY         NaN
3   EI862261 Corp  NXT_CALL_DT  2017-06-29
4   EJ319090 Corp     MATURITY  2022-08-17
5   EJ319090 Corp  NXT_CALL_DT         NaN

Fields which are erroneous fields, i.e. don't exist for any security, will continue to raise an error

>>> con.ref('EJ8189684 Corp', 'not_a_field')
ValueError: not_a_field: INVALID_FIELD

and invalid securities will raise an error

>>> con.ref('US0378331005 ISIN', 'CRNCY')
ValueError: Unknow security US0378331005 ISIN

If there are use cases for passing invalid securities that I am overlooking please let me know.

matthewgilbert commented 6 years ago

Sticking this here just to document some edge cases. Seems like for a field with no data there can be an empty element returned or a NOT_APPLICABLE_TO_REF_DATA error. Not exactly sure the criteria for when each is returned but in the current implementation both result in a NaN value which I think is desired behaviour.

import pdblp
con = pdblp.BCon(debug=True)
con.start()
con.ref("BCOM Index", ["INDX_GWEIGHT"])
DEBUG:root:Sending Request:
 ReferenceDataRequest = {
    securities[] = {
        "BCOM Index"
    }
    fields[] = {
        "INDX_GWEIGHT"
    }
    overrides[] = {
    }
}

DEBUG:root:Message Received:
 ReferenceDataResponse = {
    securityData[] = {
        securityData = {
            security = "BCOM Index"
            eidData[] = {
            }
            fieldExceptions[] = {
            }
            sequenceNumber = 0
            fieldData = {
            }
        }
    }
}
con.ref("BCOM Index", ["INDX_MWEIGHT_PX2"])
DEBUG:root:Sending Request:
 ReferenceDataRequest = {
    securities[] = {
        "BCOM Index"
    }
    fields[] = {
        "INDX_MWEIGHT_PX2"
    }
    overrides[] = {
    }
}

DEBUG:root:Message Received:
 ReferenceDataResponse = {
    securityData[] = {
        securityData = {
            security = "BCOM Index"
            eidData[] = {
            }
            fieldExceptions[] = {
                fieldExceptions = {
                    fieldId = "INDX_MWEIGHT_PX2"
                    errorInfo = {
                        source = "215::bbdbd10"
                        code = 9
                        category = "BAD_FLD"
                        message = "Field not applicable to security"
                        subcategory = "NOT_APPLICABLE_TO_REF_DATA"
                    }
                }
            }
            sequenceNumber = 0
            fieldData = {
            }
        }
    }
}