matthewgilbert / pdblp

pandas wrapper for Bloomberg Open API
MIT License
242 stars 67 forks source link

NotFoundException when a field is missing #13

Closed MarekOzana closed 7 years ago

MarekOzana commented 7 years ago

I am trying to produce chart of z-spread against next call / maturity for a list of all outstanding Contingent Convertible bonds. This requires to get MATURITY and /or NXT_CALL_DT fields for a list of bonds (where I do not know beforehand which bond has missing which field). The problem is that pdblp.ref() raises exception whenever a field does not exist for given security (e.g. MATURITY for perpetual bond). Example (3 bonds, the first having both fields, the second missing MATURITY and the last missing NXT_CALL_DT):

tickers = ["EJ8189684 Corp", "EI862261 Corp", "EJ319090 Corp"]
fields = ["MATURITY", "NXT_CALL_DT"]
df = con.ref(tickers, fields)

raises NotFoundException: Attempt to access unavailable sub-element 'MATURITY' of element 'fieldData'. (0x0006000d)

Shouldn't pdblp.refjust ignore missing field (or return NaN)? This is exactly what Excel or Rbplapi are doing:

bdp(securities=c("EJ8189684 Corp", "EI862261 Corp", "EJ319090 Corp"), 
        fields=c("MATURITY", "NXT_CALL_DT"))
                 MATURITY NXT_CALL_DT
EJ8189684 Corp 2025-09-18  2020-09-18
EI862261 Corp        <NA>  2017-06-29
EJ319090 Corp  2022-08-17        <NA>

For reference. See below the debug printout:

DEBUG:root:Sending Request:
 ReferenceDataRequest = {
    securities[] = {
        "EJ8189684 Corp", "EI862261 Corp", "EJ319090 Corp"
    }
    fields[] = {
        "MATURITY", "NXT_CALL_DT"
    }
    overrides[] = {
    }
}
DEBUG:root:Message Received:
 ReferenceDataResponse = {
    securityData[] = {
        securityData = {
            security = "EJ8189684 Corp"
            eidData[] = {
            }
            fieldExceptions[] = {
            }
            sequenceNumber = 0
            fieldData = {
                MATURITY = 2025-09-18
                NXT_CALL_DT = 2020-09-18
            }
        }
        securityData = {
            security = "EI862261 Corp"
            eidData[] = {
            }
            fieldExceptions[] = {
                fieldExceptions = {
                    fieldId = "MATURITY"
                    errorInfo = {
                        source = "144::bbdbd4"
                        code = 9
                        category = "BAD_FLD"
                        message = "Field not applicable to security"
                        subcategory = "NOT_APPLICABLE_TO_REF_DATA"
                    }
                }
            }
            sequenceNumber = 1
            fieldData = {
                NXT_CALL_DT = 2017-06-29
            }
        }
        securityData = {
            security = "EJ319090 Corp"
            eidData[] = {
            }
            fieldExceptions[] = {
                fieldExceptions = {
                    fieldId = "NXT_CALL_DT"
                    errorInfo = {
                        source = "144::bbdbd4"
                        code = 9
                        category = "BAD_FLD"
                        message = "Field not applicable to security"
                        subcategory = "NOT_APPLICABLE_TO_REF_DATA"
                    }
                }
            }
            sequenceNumber = 2
            fieldData = {
                MATURITY = 2022-08-17
            }
        }
    }
}

Note that the issue is related to #6 but not the same. The problem is not that we are naming columns of an empty DataFrame like in issue #6 .

matthewgilbert commented 7 years ago

This is a use case I had not considered. I will take a look and see what is involved to allow this.

matthewgilbert commented 7 years ago

I'm wondering what your thoughts on the expected behavior should be in a couple scenarios. When the field is truly non existent, not just not applicable for a given security, I would expect to get an error. e.g.

con.ref(["EJ8189684 Corp"], ["not_a_real_field"])

DEBUG:root:Message Received:
 ReferenceDataResponse = {
    securityData[] = {
        securityData = {
            security = "EJ8189684 Corp"
            eidData[] = {
            }
            fieldExceptions[] = {
                fieldExceptions = {
                    fieldId = "not_a_real_field"
                    errorInfo = {
                        source = "161::bbdbd18"
                        code = 9
                        category = "BAD_FLD"
                        message = "Field not valid"
                        subcategory = "INVALID_FIELD"
                    }
                }
            }
            sequenceNumber = 0
            fieldData = {
            }
        }
    }
}

whereas when the field is not applicable I agree returning a NaN is reasonable, e.g.

con.ref(["EI862261 Corp"], ["MATURITY"])

DEBUG:root:Message Received:
 ReferenceDataResponse = {
    securityData[] = {
        securityData = {
            security = "EI862261 Corp"
            eidData[] = {
            }
            fieldExceptions[] = {
                fieldExceptions = {
                    fieldId = "MATURITY"
                    errorInfo = {
                        source = "161::bbdbd2"
                        code = 9
                        category = "BAD_FLD"
                        message = "Field not applicable to security"
                        subcategory = "NOT_APPLICABLE_TO_REF_DATA"
                    }
                }
            }
            sequenceNumber = 0
            fieldData = {
            }
        }
    }
}

These behaviours are both consistent with Rblpapi. When given a non existent ticker, my preference would be to throw an error (Currently the behaviour is buggy in the sense that it ignores the error and returns a DataFrame without the erroneous ticker, unless all tickers are erroneous in which case it fails later when attempting to assign column names to an empty DataFrame). I can't think of any use cases for wanting to pass in erroneous tickers? @tschm you had suggested a fix for this in #6, I'm wondering if you have any example use cases for this in mind?

con.ref(["EJ8189684 Corp", "not_a_ticker"], ["MATURITY", "NXT_CALL_DT"])

DEBUG:root:Message Received:
 ReferenceDataResponse = {
    securityData[] = {
        securityData = {
            security = "EJ8189684 Corp"
            eidData[] = {
            }
            fieldExceptions[] = {
            }
            sequenceNumber = 0
            fieldData = {
                MATURITY = 2025-09-18
                NXT_CALL_DT = 2020-09-18
            }
        }
        securityData = {
            security = "not_a_ticker"
            eidData[] = {
            }
            securityError = {
                source = "161::bbdbd10"
                code = 15
                category = "BAD_SEC"
                message = "Unknown/Invalid security [nid:161] "
                subcategory = "INVALID_SECURITY"
            }
            fieldExceptions[] = {
            }
            sequenceNumber = 1
            fieldData = {
            }
        }
    }
}

Out[29]: 
           ticker        field       value
0  EJ8189684 Corp     MATURITY  2025-09-18
1  EJ8189684 Corp  NXT_CALL_DT  2020-09-18
MarekOzana commented 7 years ago

I agree with you Matthew: I would expect ref()to either pass or return NaN for NOT_APPLICABLE_TO_REF_DATA and raise exception in case of non existent ticker or field.

matthewgilbert commented 7 years ago

This is fixed in 659c0d2a8af7f8d24cf53adeb5f3790774160cdc and related to https://github.com/matthewgilbert/pdblp/issues/6

gflores87 commented 5 years ago

I agree with you Matthew: I would expect ref()to either pass or return NaN for NOT_APPLICABLE_TO_REF_DATA and raise exception in case of non existent ticker or field.

I would have to slightly disagree. While it is good to reference Excel as to what behavior to expect, the batch nature of pdblp's requests encourage some deviations from Excel. For example, you might have a large list of tickers, say 200, and as long as 1 of them is not supported (a bad strike for an option chain, for example) then the request won't resolve. I believe ideal behavior would be to handle bad tickers internally and return everything else. I might be wrong but I think it's worth a quick discussion :)

ashishgupta85 commented 4 years ago

Just to follow up that i would also would like to see consistent behaviour with excel so if an invalid ticker were to be sent then it would return nan for that one but the rest would be fine. The use case as mentioned is when we have a large array of tickers that are not maintained as well as we would like due to changes, we would still like it to return those that are valid. Maybe an optional parameter such as ignore_invalid_tickers would provide full flexibility?