vfilimonov / pydatastream

Python interface to the Refinitiv Datastream (former Thomson Reuters Datastream)
MIT License
71 stars 30 forks source link

DS.fetch() output #29

Closed bjleend closed 2 years ago

bjleend commented 2 years ago

I would like to extract data using pydatastream module using Datastream ISIN code as following:

data = DS.fetch(['US91835J2078','KR7114630007'], ['X(UP)~U$','X(P)~U$','VO'], date_from='2021-09-27') I was able to obtain 'data', but the result is sorted in the order of KRxxxx and USxxxx. I don't know why they are sorted, but I would like to have data in the original order as shown in ['USxxxx', 'KRxxxx'].

I have extensive ISIN list, so it is important to get the data as intended. There must be a simple on/off switch to get the result, but I cannot find the solution. Can someone help me how to get the result as intended?

Thanks.

vfilimonov commented 2 years ago

Hello @bjleend

The API of Datastream does not guarantee the order of the response.

However you could rearrange the output using the standard pandas functionality. Your response data should have multi-index on (ISIN, date), so in order to rearrange the ISINs you could do something like:

isins = ['US91835J2078','KR7114630007']
data = DS.fetch(isins, ['X(UP)~U$','X(P)~U$','VO'], date_from='2021-09-27')
data = data.reindex(isins, level=0)
bjleend commented 2 years ago

Thank you very much. This solves my problem. There is one more quirkiness in the DS.fetch() API that I just want to know whether this is due to the API restriction or Datastream restriction. It appears that (# of ISINs) x (# of fields) is limited to 100, and anything bigger than that (like 200x1) returns error. For example, it is limited to 100x1, 10x10, or something like that. Can you confirm either way that API is limited to the size or is it Datastream restriction. Thanks.

vfilimonov commented 2 years ago

There’s indeed a limit for number of requested series (and as far as I know data points) per request for the Datastream API, and 100 sounds about right. pydatastresm does not introduce own restrictions.

If you are going to confirm the exact limits with the Datastream support, please put a note here as well - I will add it to the readme.

On 27 Oct 2021, at 02:13, bjleend @.***> wrote:

 Thank you very much. This solves my problem. There is one more quirkiness in the DS.fetch() API that I just want to know whether this is due to the API restriction or Datastream restriction. It appears that (# of ISINs) x (# of fields) is limited to 100, and anything bigger than that (like 200x1) returns error. For example, it is limited to 100x1, 10x10, or something like that. Can you confirm either way that API is limited to the size or is it Datastream restriction. Thanks.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android.

bjleend commented 2 years ago

I have another issue with data=DS.fetch(). When I take only one ticker (or ISIN) like @AAPL, data contains only one Index (date) without ticker. When I have multiple tickers [@AAPL, U:MMM], it returns two Index (ticker and date) columns. How can I get two index column even with one ticker, that is, I want to have @AAPL in the first column and date in the second column both as Index.

vfilimonov commented 2 years ago

Unfortunately, this is not implemented in the module - the first index always got removed if only one ticker is requested. I might add an option to keep it in the future (defaulting to "not keep" for backward compatibility). And PR is always welcome, indeed.

bjleend commented 2 years ago

It's good to know that you may consider to add this feature in the future. It will be very useful to conform with multiple ticker output. My current short-cut to avoid this problem is not to have a single ticker DS.fetch() call.
Also, regarding my earlier question about the limitations of DS.fetch(), I have confirmed with Datastream that they have limited single call request to 100, (# of tickers) x (# of fields), like 10x10 or 20x5, etc. Number of time-series data point is not limited as long as your monthly limit of 10 million data point is not exceeded. By the way, your pydatastream API is much better for my application than Datastream's own API. It's simple and easy to implement for my application. I recommend your python API to anyone to access Datastream.

vfilimonov commented 2 years ago

Thank you, @bjleend, I appreciate it. Please test the version from master (I don't have access to DSWS at the moment) - by specifying always_multiindex=True you could force fetch() to keep the first index. If it works, I will push it to pypi.

bjleend commented 2 years ago

This works beautifully. Thank you. Btw, when you push it to pypi, should I uninstall and re-install it, or just install again.

vfilimonov commented 2 years ago

Perfect! Version 0.6.5 is on the pypi now It should be enough to do pip install -U pydatastream, however if for some reason it does not work - then indeed uninstall and install again