Add support to BQL queries

avantgardeam commented 1 year ago

Hello!

In line with our discussions on issue #18, I've opened this PR that brings in BQL query support.

I've had to tweak BlpParser due to the unique nature of this endpoint, adding a few new methods to manage its different behavior.

Since there's no formal documentation for this endpoint, I decided to pass through all the data we get from Bloomberg - ensuring we don't miss anything important.

Because the response might vary per query (like optional secondary columns), I melted some dataframes. This ensures that we can safely aggregate different fields, catering for varied response formats.

It would be great to hear any feedback on this!

Thanks, @DS-London and @matthewgilbert!

Slik300 commented 1 year ago

Hi Everyone,

Awesome work! This has been light years better than the stock Python API or sandboxed excel add-in.

I'm still very new to both using BBG and Python, and have no one to speak about this niche within a niche.

If one of you could lightly explain the differences between the blp and xbbg wrappers that would be much appreciated.

Looking forward to trying out bql.

Thanks again.

avantgardeam commented 1 year ago

Hi Everyone,

Awesome work! This has been light years better than the stock Python API or sandboxed excel add-in.

I'm still very new to both using BBG and Python, and have no one to speak about this niche within a niche.

If one of you could lightly explain the differences between the blp and xbbg wrappers that would be much appreciated.

Looking forward to trying out bql.

Thanks again.

Hello,

Thanks for the enthusiasm!

@matthewgilbert can better elaborate on this, but in short, I would say that blp's architecture is more robust and its error handling is clearer than xbbg. However, if you're used to Excel, you might find xbbg's syntax more familiar. By the way, for questions like these, it's better to open an Issue rather than commenting on a PR, especially when it's not directly related to the PR.

Regarding BQL support, the PR hasn't been merged into the main blp branch yet. It'd be best to wait for that before diving into this functionality.

Keep in mind, this is an undocumented API, so we can't make any hard promises about its behavior. Therefore, it's a good idea to stick with the usual methods (bdp, bdh, etc.) when you can.

matthewgilbert commented 1 year ago

Thanks @avantgardeam, will try to look at this over the next week and leave feedback.

avantgardeam commented 1 year ago

Thanks @avantgardeam, will try to look at this over the next week and leave feedback.

Hello, @matthewgilbert! Have you had any time to check this?

Thanks!

matthewgilbert commented 1 year ago

Thanks @avantgardeam, will try to look at this over the next week and leave feedback.

Hello, @matthewgilbert! Have you had any time to check this?

Thanks!

Apologies for the delay. Mostly looks good, just left a few comments

avantgardeam commented 1 year ago

Thanks @avantgardeam, will try to look at this over the next week and leave feedback.

Hello, @matthewgilbert! Have you had any time to check this? Thanks!

Apologies for the delay. Mostly looks good, just left a few comments

Thanks a lot for the review!

A somewhat unrelated question: don't you think that it would be better to hide the methods that collect data? _collect_to_bdh instead of collect_to_bdh, for example.

matthewgilbert commented 1 year ago

A somewhat unrelated question: don't you think that it would be better to hide the methods that collect data? _collect_to_bdh instead of collect_to_bdh, for example.

Yes that probably makes sense. I can't recall if I original had a use case for calling these independently but if I did I can't recall it now.

matthewgilbert commented 1 year ago

Merged, thanks @avantgardeam !

matthewgilbert commented 1 year ago

One question looking at the example data returned

                     security      field  secondary_name       secondary_value        value
            0  AAPL US Equity  px_last()        CURRENCY                   USD   192.755005
            1  IBM US Equity   px_last()        CURRENCY                   USD   139.289993
            2  AAPL US Equity  px_last()            DATE  2023-07-24T00:00:00Z   192.755005
            3  IBM US Equity   px_last()            DATE  2023-07-24T00:00:00Z   139.289993

It seems like there is some duplication of data that is caused? Do you have an example of what the raw responses passed into collect_to_bql look like?

avantgardeam commented 1 year ago

One question looking at the example data returned

                     security      field  secondary_name       secondary_value        value
            0  AAPL US Equity  px_last()        CURRENCY                   USD   192.755005
            1  IBM US Equity   px_last()        CURRENCY                   USD   139.289993
            2  AAPL US Equity  px_last()            DATE  2023-07-24T00:00:00Z   192.755005
            3  IBM US Equity   px_last()            DATE  2023-07-24T00:00:00Z   139.289993

It seems like there is some duplication of data that is caused? Do you have an example of what the raw responses passed into collect_to_bql look like?

This is the response for the query in the docstring:

{
  'field': 'px_last()',
  'id': ['AAPL US Equity', 'IBM US Equity'],
  'value': [193.22, 142.97],
  'DATE': ['2023-07-28T00:00:00Z', '2023-07-28T00:00:00Z'],
  'CURRENCY': ['USD', 'USD']
}

One could propose that this data be presented more efficiently by simply transforming each list into a column. But, this alternative does have significant drawbacks. For instance, let's consider the query below:

bquery = BlpQuery().start()

bquery.bql(expression="get(px_to_book_ratio(as_of_date=range(2021-01-01, 2023-06-30, frq=CY)), gics_sector_name()) for('AAPL US Equity')")

The resulting response from the parser (which yields a field at a time) looks like this:

{
  'field': 'px_to_book_ratio(as_of_date=range(2021-01-01,2023-06-30,frq=CY))',
  'id': ['AAPL US Equity', 'AAPL US Equity', 'AAPL US Equity', 'AAPL US Equity'],
  'value': ['NaN', 46.234021081312406, 'NaN', 49.06639630972683],
  'REVISION_DATE': ['2020-10-30T00:00:00Z', '2021-10-29T00:00:00Z', '2022-10-28T00:00:00Z', '2023-05-05T00:00:00Z'],
  'AS_OF_DATE': ['2021-01-01T00:00:00Z', '2021-12-31T00:00:00Z', '2022-12-31T00:00:00Z', '2023-06-30T00:00:00Z'],
  'PERIOD_END_DATE': ['2020-09-26T00:00:00Z', '2021-09-25T00:00:00Z', '2022-09-24T00:00:00Z', '2023-04-01T00:00:00Z']
}
{
  'field': 'gics_sector_name()',
  'id': ['AAPL US Equity'],
  'value': ['Information Technology']
}

Here, we encounter two fields returning notably diverse responses. To my mind, there isn't a straightforward method to deliver this data to the user without converting these two tables from wide to long format to standardize their structure.

The crux of this explanation is that the appearance of duplicates is a necessary side effect of table pivoting. Admittedly, this is a substantial disadvantage. Nonetheless, as this is an undocumented API, I've yet to identify a superior method for data delivery that ensures no information is lost.

If you have any ideas, I would be happy to implement them!

avantgardeam commented 1 year ago

A somewhat unrelated question: don't you think that it would be better to hide the methods that collect data? _collect_to_bdh instead of collect_to_bdh, for example.

Yes that probably makes sense. I can't recall if I original had a use case for calling these independently but if I did I can't recall it now.

What are your thoughts on a simple PR to hide these methods?

matthewgilbert commented 1 year ago

If you have any ideas, I would be happy to implement them!

I don't have a strong opinion since I'm not actively using this feature (or the library more generally). It strikes me that this problem stems for the fact that the data is inherently non tabular, so I think your approach seems fine.

matthewgilbert commented 1 year ago

A somewhat unrelated question: don't you think that it would be better to hide the methods that collect data? _collect_to_bdh instead of collect_to_bdh, for example.

Yes that probably makes sense. I can't recall if I original had a use case for calling these independently but if I did I can't recall it now.

What are your thoughts on a simple PR to hide these methods?

I don't have much bandwidth for maintenance these days so for more minor cleanups like this I don't have much appetite since I don't want people to waste their time submitting PRs that I don't end up reviewing.

avantgardeam commented 1 year ago

A somewhat unrelated question: don't you think that it would be better to hide the methods that collect data? _collect_to_bdh instead of collect_to_bdh, for example.

Yes that probably makes sense. I can't recall if I original had a use case for calling these independently but if I did I can't recall it now.

What are your thoughts on a simple PR to hide these methods?

I don't have much bandwidth for maintenance these days so for more minor cleanups like this I don't have much appetite since I don't want people to waste their time submitting PRs that I don't end up reviewing.

I understand! Thanks a lot for reviewing this PR.

gncosta commented 9 months ago

Is it possible to release a new version containing the changes in pypi?

matthewgilbert commented 9 months ago

@gncosta this should be released now.

gncosta commented 9 months ago

Tks! On 22 Oct 2023, at 13:19, Matthew Gilbert @.***> wrote: @gncosta this should be released now.

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: @.***>

matthewgilbert / blp

Add support to BQL queries #28