pydata / pandas-datareader

Extract data from a wide range of Internet sources into a pandas DataFrame.
https://pydata.github.io/pandas-datareader/stable/index.html
Other
2.94k stars 681 forks source link

MOEX data is incomplete and outdated #550

Closed khazamov closed 6 years ago

khazamov commented 6 years ago

For example, I can't get data for Sberbank (largest bank in post-soviet space) and some ETFs, symbol codes: (RU0009029540) / (IE00BD3QHZ91).

Seems like MOEX provides very limited data and not up-to-date. Is there anything that can be done here?

addisonlynch commented 6 years ago

Seems to be the case. Will look into it. @Mottl, are you aware of any recent changes to the API? I can't seem to find a changelog, but the endpoints look to be the same as before.

web.DataReader("RU0009029540", "moex", datetime(2017,1,1), datetime(2018,1,1))

Returns empty:

[BOARDID, SHORTNAME, SECID, NUMTRADES, VALUE, OPEN, LOW, HIGH, LEGALCLOSEPRICE, WAPRICE, CLOSE, VOLUME, MARKETPRICE2, MARKETPRICE3, ADMITTEDQUOTE, MP2VALTRD, MARKETPRICE3TRADESVALUE, ADMITTEDVALUE, WAVAL]
Index: []

web.DataReader("SBER", "moex", datetime(2017,1,1), datetime(2018,1,1))

Quite a few NaN's and some unicode issues here...

BOARDID   SHORTNAME SECID  NUMTRADES   VALUE   OPEN    LOW   HIGH  ...    VOLUME  MARKETPRICE2  MARKETPRICE3  ADMITTEDQUOTE  MP2VALTRD  MARKETPRICE3TRADESVALUE  ADMITTEDVALUE  WAVAL
TRADEDATE                                                                     ...
2017-01-04    TQBR  \u041f\u0430\u0432\u043b\u0410\u0432\u0442 \u0430\u043e  PAZA          0       0    NaN    NaN    NaN  ...         0           NaN         552.0          551.0          0                   502180              0    NaN
2017-01-05    TQBR  \u041f\u0430\u0432\u043b\u0410\u0432\u0442 \u0430\u043e  PAZA          0       0    NaN    NaN    NaN  ...         0           NaN         552.0          551.0          0                   502180              0    NaN
2017-01-06    TQBR  \u041f\u0430\u0432\u043b\u0410\u0432\u0442 \u0430\u043e  PAZA          0       0    NaN    NaN    NaN  ...         0           NaN         552.0          551.0          0                   502180              0    NaN
2017-01-09    TQBR  \u041f\u0430\u0432\u043b\u0410\u0432\u0442 \u0430\u043e  PAZA          0       0    NaN    NaN    NaN  ...         0           NaN         552.0          551.0          0                   502180              0    NaN
2017-01-10    SMAL  \u041f\u0430\u0432\u043b\u0410\u0432\u0442 \u0430\u043e  PAZA          1    1134  567.0  567.0  567.0  ...         2           NaN           NaN            NaN          0                        0              0    NaN
2017-01-10    TQBR  \u041f\u0430\u0432\u043b\u0410\u0432\u0442 \u0430\u043e  PAZA          1    5670  567.0  567.0  567.0  ...        10           NaN         553.0          567.0          0                   503470              0    NaN
2017-01-11    TQBR  \u041f\u0430\u0432\u043b\u0410\u0432\u0442 \u0430\u043e  PAZA          0       0    NaN    NaN    NaN  ...         0           NaN         553.0          567.0          0                   503470              0    NaN
2017-01-12    TQBR  \u041f\u0430\u0432\u043b\u0410\u0432\u0442 \u0430\u043e  PAZA          0       0    NaN    NaN    NaN  ...         0           NaN         553.0          567.0          0                   503470              0    NaN
Mottl commented 6 years ago

It seems that RU0009029540 was started on 1999-01-06 and ended on 2006-08-03. Within this date interval all data retrieves as it should ([2131 rows x 19 columns]). This page https://iss.moex.com/iss/securities.xml?q=RU0009029540 states that it is not trading (is_traded="0").

For IE00BD3QHZ91 you should use FXUS ticker (refer to https://iss.moex.com/iss/securities.xml?q=IE00BD3QHZ91) I will check about encoding issues. Thanks, @addisonlynch

Mottl commented 6 years ago

@addisonlynch, could you write your LC_CTYPE, LANG and LC_ALL?

addisonlynch commented 6 years ago

Ahh oops, POSIX. No issues with en_US.UTF-8.

LANG="POSIX"
LC_CTYPE="POSIX"
LC_NUMERIC="POSIX"
LC_TIME="POSIX"
LC_COLLATE="POSIX"
LC_MONETARY="POSIX"
LC_MESSAGES="POSIX"
LC_PAPER="POSIX"
LC_NAME="POSIX"
LC_ADDRESS="POSIX"
LC_TELEPHONE="POSIX"
LC_MEASUREMENT="POSIX"
LC_IDENTIFICATION="POSIX"
Mottl commented 6 years ago

Ok, then we can close this issue. @bashtage, @davidastephens, @jreback

khazamov commented 6 years ago

as pointed out by @addisonlynch the query format assumes using ticker symbols, not ISIN codes.

web.DataReader("SBER", "moex", datetime(2017,1,1), datetime(2018,1,1))

instead of

web.DataReader("RU0009029540", "moex", datetime(2017,1,1), datetime(2018,1,1))

The docs point out to the outdated version of the call.

Mottl commented 6 years ago

@khazamov, in all ways you should use SECID (security code) — as in example in the documentation. Please clarify the issue.

For USD000UTSTOM: https://iss.moex.com/iss/securities/USD000UTSTOM.xml?lang=en For SBER: https://iss.moex.com/iss/securities/SBER.xml?lang=en Both USD000UTSTOM and SBER are SECIDs