Closed rayanbenredjeb closed 7 months ago
Maybe @avantgardeam has thoughts on this since they are the author of the feature.
Original PR and discussion of the feature is here https://github.com/matthewgilbert/blp/pull/28
Thank you for raising this issue!
I acknowledge that the current parsing method may not be ideal, as was mentioned in the initial pull request (PR). However, we are dealing with an undocumented endpoint, which means its behavior cannot be fully guaranteed.
In response to your observations, it is essential to recognize that the data frame is structured in a long format. This context explains why the appearance of duplicates in "secondary_names" is not only predictable but also intentional.
Moreover, aligning secondary values with primary ones should not pose any significant challenges. Transforming the data frame into a wide format would further clarify this point.
Attempting to replicate the behavior of the Excel API could introduce complications, such as the potential loss of important information like "REVISION_DATE" in your case, and may also lead to issues with edge cases where the parser might not handle certain queries accurately.
With all this in mind, I am open to reviewing any PR aimed at enhancing the parser's functionality. I would be willing to implement changes myself, provided they do not result in data loss or overlook any edge cases. As of now, I am not aware of any alternative solutions that meet these criteria.
Hello @matthewgilbert,
Upon further review and consideration of @rayanbenredjeb's concerns, I believe our current implementation might benefit from some revisions.
As an example, take a query similar to the one presented by @rayanbenredjeb: get(is_eps(fa_period_offset=1), gics_sector_name()) for('AAPL US EQUITY') with(dates=range(-1D,0d))
. In response, Bloomberg provides two distinct dataframes, as depicted in the linked image.
For user presentation, we have a couple of options: delivering the data as either a list of dataframes or as a dictionary with field names as keys and dataframes as values. Another method, as suggested by @rayanbenredjeb, is to merge these dataframes based on their ID, akin to Excel's functionality.
Considering the separate nature of these datasets, my initial preference is towards using a list or dictionary format for clarity. However, I am open to suggestions and would greatly appreciate your feedback on this matter.
What would the data returned from similar query in Excel look like for reference?
My initial thought is that these seem like a chained set of queries which are unrelated so returning as a list of dataframes would make sense, instead of trying to shoe horn them into a larger dataframe.
Also what does the current implementation return for your above query @avantgardeam ?
@matthewgilbert, Image 1 shows the query result from Excel, and Image 2 from blp.
The original idea was to melt various fields into a unified structure, followed by the concatenation of these dataframes. However, if you attempt to pivot the dataframe returned by blp
you will get ValueError: Index contains duplicate entries, cannot reshape
.
Finally, I agree with you that forcing totally different datasets into a single one is not ideal. Based on that, should we return a list of dataframes (as blpapi
does), or use a dict with fields as keys and dataframes as values?
Image 1:
Image 2:
I think either is fine, although I would probably lean towards a list.
Problem description
The current behavior yields a df with this format
There are 2 problems:
Expected Output
Thank you for considering these changes, I believe it is the only two problems that make it less accurate than its excel counterpart.