matthewgilbert / blp

Pythonic interface for Bloomberg Open API
Apache License 2.0
112 stars 24 forks source link

`blp.bds` wrongly parses the field `ID_EXCH_SYMBOL` as a sequence #42

Open slmg opened 4 months ago

slmg commented 4 months ago

Problem description

The value of the field ID_EXCH_SYMBOL is "ES", but it gets parsed as if it was sequence of distinct values instead: ["E", "S"]

Code to reproduce

> from blp import __version__ as blp_version
> from blp import blp

> blp_version
0.0.3

> bquery = blp.BlpQuery().start()
> bquery.bds(["ESM4 Index"], "ID_EXCH_SYMBOL")
    0
0  E
1  S

Expected Output

    0
0  ES
loicdiridollou commented 2 months ago

I was bumping into the same issue recently for a research project. It seems like the code is using extend instead of append (in collect_to_bds) so it adds all the elements on the string one-by-one. For ES it would add E and S instead of adding the string in full.

    raise ValueError(f"responses contain different fields, {field} and {keys[0]}")
field = keys[0]
data = response["data"][field]
try:
    rows.extend(data)

I have experimented with switching the extend to append and it seems to solve the issue. @matthewgilbert is there a use case where the extend is need, I have tried a couple of fields and would not be able to find one that required extend. Please let me know, happy to work on a PR to switch it.

matthewgilbert commented 2 months ago

Hmm, I seem to recall this being something you would want to call with bdp instead of bds but it's been awhile since I've looked at this. I do agree that breaking up the string field is not desired behavior though.

What would be the impact of that change on this call bquery.bds("BCOM Index", "INDX_MWEIGHT")? Would it result in a nested list?

matthewgilbert commented 2 months ago

Looking in more detail I think this is probably more a case of the error not appropriately catching this because a string is an iterable, i.e.

            try:
                rows.extend(data)
            except TypeError:
                raise TypeError(f"response data must be bulk reference data, received {response['data']}")
loicdiridollou commented 2 months ago

Hmm I see what you mean. I tried this over and your usecase would break my suggestion. I think the better approach is to check what kind of data is returned, INDX_MWEIGHT returns a list of dictionary whereas my usecase just returned a string.

image

vs

image

Maybe the wisest option is to check if the data is a list of items in which case you keep it as is, and if it is a string then you make it a list of that single item (like this is done in other parts of the API wrapper). Open to better options, I don't use BDS very often to know if that check would break other use cases.

matthewgilbert commented 2 months ago

I think in this case you should just be using bdp instead of bds and we should be raising an error in bds if the result is not a non string iterable.

loicdiridollou commented 2 months ago

Indeed bdp seems to work accordingly for the current usecase.