matthewgilbert / pdblp

pandas wrapper for Bloomberg Open API
MIT License
241 stars 67 forks source link

Update ref() function to enable each field has its own overrides #24

Closed jasonlocal closed 6 years ago

jasonlocal commented 6 years ago

added functions and updated ref() function to support overrides values for each field. Modified ref() return data_frame to have better structured data_frame. Tweaked original ref() test cases to incorporate new ref() returned data_frame, and have all the ref() tests passed. Next step: update relevant functions that calls create_req(), for the same purpose

matthewgilbert commented 6 years ago

Thanks for the commit, I have a couple questions / concerns. It appears what this code is doing is modifying ref() to send and parse multiple requests in order to allow the functionality discussed in #22. I have a couple issues with this.

  1. It is unclear what the advantage of handling iterative ReferenceDataRequests internally is relative to allowing the user to do this? Why not just allow the user to make iterative calls to ref() since these changes are just doing that under the hood?

  2. Several unit tests seem to be modifying the columns for the expected data indicating the format of the data returned has changed?

If the underlying service ReferenceDataRequest does not support a single requests with multiple overrides than my preference would be to just handle this at the user level. This can be checked using the blpapi library outside of pdblp.

jasonlocal commented 6 years ago

Hey Matthew, Thanks for the comment on this. The advantage of handling iterative ReferenceDataRequests internally is to give clients options to pass in different overrides that correspond to different fields. If ref() function gives users the flexibility of passing list of fields in a single ref() call, then it should also allow them to pass in different overrides value. blapi doesn't support passing in different overrides for different fields in a single request, which is why I chose to iterate requests in ref() function. To answer your second question, yes it is essentially the same test cases that you had earlier, except I tweaked columns of the returned data. To me, this new data structure looks more clean, concise, and easier to work with from the clients perspective.

One more thing, when I was testing unit tests in test_pdblp.py, I noticed test_bdib() is failing. I guess this is more of a separate issue relative to this pull request, but it is something need to be investigated.

matthewgilbert commented 6 years ago

If ref() function gives users the flexibility of passing list of fields in a single ref() call, then it should also allow them to pass in different overrides value.

The difference here is that the underlying Bloomberg provided service ReferenceDataRequest supports multiple field requests, hence why pdblp exposes this functionality. My preference would be not to include something like this since ultimately it complicates the code base and does not provide the user with any added functionality above what they could accomplish with a for loop.

The reason something like ref_hist() exists which does something similar is because it makes use of correlationIDs to be able to send multiple requests at once which speeds up the response time from the Bloomberg Service. That being said, I have reservations about the ref_hist() method since I believe it complicates / bloats the library. Arguably there is a more succinct way of dealing with a multi request paradigm using correlationIDs that is more general than what is done in ref_hist.

jasonlocal commented 6 years ago

for ref() function, would it be better to restrict clients to pass in list of fields parameters if you want to leave the function as it is? To me, it doesn't make much of sense that ReferenceDataRequest support passing in list of fields but not supporting passing in different overrides.

As far as ref__hist(), looks like it is passing different correlationIDs for each request. Under the hood, is there difference in terms of speed between creating n different requests then calling _parse_ref() n number of times(like the approach I did in ref() function) and using different correlationIDs to call par_ref() once? Also, just wondering what part in ref_hist() makes you think complicates the libraries?

matthewgilbert commented 6 years ago

for ref() function, would it be better to restrict clients to pass in list of fields parameters if you want to leave the function as it is? To me, it doesn't make much of sense that ReferenceDataRequest support passing in list of fields but not supporting passing in different overrides.

Restricting the user to not access the full functionality of the library does not seem like a good idea. If you print a response from a Bloomberg ReferenceDataRequest you will see something like the following (although not in json, in a string dump to the console)

 ReferenceDataResponse = {
    securityData[] = {
        securityData = {
            security = "AUD Curncy"
            eidData[] = {
            }
            fieldExceptions[] = {
            }
            sequenceNumber = 0
            fieldData = {
                TIME = "18:33:47"
                LAST_PRICE_TIME_TODAY = 18:33:47.000
            }
        }
    }
}

The Open Bloomberg API natural supports sendings lists of fields, it does not naturally support sending lists of overrides. If you set multiple overrides for the same field, to the best of my understanding this will override the same value, e.g. in the follow REFERENCE_DATE would just be overrides if you naively attempted to set multiple overrides. Providing a convenience method for the user to do this which amounts to doing a for loop seems out of scope

     ReferenceDataRequest = {
        securities[] = {
            "AUD Curncy"
        }
        fields[] = {
            "SETTLE_DT"
        }
        overrides[] = {
            overrides = {
                fieldId = "REFERENCE_DATE"
                value = "20161010"
            }
        }
     }

Under the hood, is there difference in terms of speed between creating n different requests then calling _parse_ref() n number of times(like the approach I did in ref() function) and using different correlationIDs to call par_ref() once?

Yes there is a large difference in speed from my recollections, I believe this is why ref_hist() exists in the first place instead of just using a for loop around ref()

Also, just wondering what part in ref_hist() makes you think complicates the libraries?

This is extending the Bloomberg Open API instead of just wrapping it in a convenient DataFrame access. I have mixed feelings about this since I believe this bloats the library and complicates the API. However because of performance related issues without correlationIDs it seemed like the best decision at the time.

matthewgilbert commented 6 years ago

I appreciate the effort to extend the functionality of ReferenceDataRequest but as discussed in #22 I believe this is out of scope so am going to close this.