scrtlabs / catalyst

An Algorithmic Trading Library for Crypto-Assets in Python
http://enigma.co
Apache License 2.0
2.49k stars 724 forks source link

Problem when ingesting binance data #405

Open sam31415 opened 6 years ago

sam31415 commented 6 years ago

Dear Catalyst Maintainers,

Before I tell you about my issue, let me describe my environment:

Environment

Name Version Build Channel

aiodns 1.1.1 aiohttp 3.0.1 alembic 0.9.7 async-timeout 2.0.1 attrdict 2.0.0 attrs 17.4.0 backcall 0.1.0 backports 1.0 py36h81696a8_1 backports.weakref 1.0rc1 py36_0 bcolz 1.2.1 bleach 2.1.3 bleach 1.5.0 py36_0 boto3 1.5.27 botocore 1.8.50 Bottleneck 1.2.1 cchardet 2.1.1 ccxt 1.12.131 certifi 2018.4.16 py36_0 certifi 2018.1.18 chardet 3.0.4 click 6.7 colorama 0.3.9 configparser 3.5.0 contextlib2 0.5.5 cycler 0.10.0 cyordereddict 1.0.0 Cython 0.27.3 cytoolz 0.9.0.1 decorator 4.3.0 docutils 0.14 empyrical 0.2.1 enigma-catalyst 0.5.15 entrypoints 0.2.3 eth-abi 1.1.1 eth-account 0.2.3 eth-hash 0.1.1 eth-keyfile 0.5.1 eth-keys 0.2.0b3 eth-rlp 0.1.2 eth-utils 1.0.1 h5py 2.8.0 py36h3bdd7fb_0 hdf5 1.10.2 hac2f561_1 hexbytes 0.1.0 html5lib 0.9999999 py36_0 html5lib 1.0.1 icc_rt 2017.0.4 h97af966_0 idna 2.6 idna-ssl 1.0.1 intervaltree 2.1.0 ipykernel 4.8.2 ipython 6.3.1 ipython-genutils 0.2.0 ipywidgets 7.2.0 jedi 0.11.1 Jinja2 2.10 jmespath 0.9.3 jsonschema 2.6.0 jupyter 1.0.0 jupyter-client 5.2.3 jupyter-console 5.2.0 jupyter-core 4.4.0 Keras 2.1.3 keras 2.2.0 0 keras-applications 1.0.2 py36_0 keras-base 2.2.0 py36_0 keras-preprocessing 1.0.1 py36_0 kiwisolver 1.0.1 libprotobuf 3.5.2 he0781b1_0 Logbook 0.12.5 lru-dict 1.1.6 lxml 4.2.1 Mako 1.0.7 markdown 2.6.11 py36_0 MarkupSafe 1.0 matplotlib 2.2.2 mistune 0.8.3 mkl 2017.0.3 0 mpld3 0.3 multidict 4.1.0 multipledispatch 0.4.9 nbconvert 5.3.1 nbformat 4.4.0 networkx 2.1 notebook 5.4.1 numexpr 2.6.4 numpy 1.14.0 numpy 1.13.1 py36_0 pandas 0.19.2 pandas 0.20.3 py36_0 pandas-datareader 0.6.0 pandocfilters 1.4.2 parsimonious 0.8.0 parso 0.1.1 patsy 0.5.0 patsy 0.4.1 py36_0 pickleshare 0.7.4 pip 10.0.1 pip 9.0.1 py36_1 prompt-toolkit 1.0.15 protobuf 3.5.2 py36h6538335_0 pycares 2.3.0 pycryptodome 3.5.1 pyfolio 0.8.0+131.gf7ec651 Pygments 2.2.0 pyparsing 2.2.0 pypiwin32 223 python 3.6.2 0 python-dateutil 2.7.3 python-dateutil 2.6.1 py36_0 python-editor 1.0.3 pytz 2017.2 py36_0 pytz 2016.4 pywin32 223 pywinpty 0.5.1 pyyaml 3.12 py36h1d1928f_1 pyzmq 17.0.0 qtconsole 4.3.1 redo 1.6 requests 2.18.4 requests-file 1.4.3 requests-ftp 0.3.1 requests-toolbelt 0.8.0 rfpimp 1.1.1 rlp 0.6.0 s3transfer 0.1.13 schedule 0.5.0 scikit-learn 0.19.1 scikit-multiflow 0.1.0 scipy 0.19.1 np113py36_0 scipy 1.0.0 seaborn 0.8.1 Send2Trash 1.5.0 setuptools 38.5.1 setuptools 36.4.0 py36_1 simplegeneric 0.8.1 six 1.10.0 py36_0 six 1.11.0 sklearn 0.0 sortedcontainers 1.5.9 SQLAlchemy 1.2.2 statsmodels 0.8.0 np113py36_0 tables 3.4.2 tabulate 0.8.2 tensorflow 1.2.1 py36_0 terminado 0.8.1 testpath 0.3.1 toolz 0.9.0 tornado 5.0.2 tqdm 4.23.4 traitlets 4.3.2 urllib3 1.22 vc 14 0 vs2015_runtime 14.0.25420 0 wcwidth 0.1.7 web3 4.2.1 webencodings 0.5.1 websockets 5.0.1 werkzeug 0.14.1 py36_0 wheel 0.29.0 py36_0 widgetsnbextension 3.2.0 wincertstore 0.2 py36_0 wrapt 1.10.11 xgboost 0.71 xgboost 0.72 xgboost 0.7 yaml 0.1.7 hc54c509_2 yarl 1.1.0 zlib 1.2.11 vc14_0 [vc14]

Now that you know a little about me, let me tell you about the issue I am having:

Description of Issue

When ingesting minute data for Binance, I get repeteadly this error for various pairs/dates:

Traceback (most recent call last):
  File "c:\users\user\anaconda2\envs\catalyst3\lib\runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "c:\users\user\anaconda2\envs\catalyst3\lib\runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "C:\Users\user\Anaconda2\envs\catalyst3\Scripts\catalyst.exe\__main__.py", line 9, in <module>
  File "c:\users\user\anaconda2\envs\catalyst3\lib\site-packages\click\core.py", line 722, in __call__
    return self.main(*args, **kwargs)
  File "c:\users\user\anaconda2\envs\catalyst3\lib\site-packages\click\core.py", line 697, in main
    rv = self.invoke(ctx)
  File "c:\users\user\anaconda2\envs\catalyst3\lib\site-packages\click\core.py", line 1066, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "c:\users\user\anaconda2\envs\catalyst3\lib\site-packages\click\core.py", line 895, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "c:\users\user\anaconda2\envs\catalyst3\lib\site-packages\click\core.py", line 535, in invoke
    return callback(*args, **kwargs)
  File "c:\users\user\anaconda2\envs\catalyst3\lib\site-packages\click\decorators.py", line 17, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "c:\users\user\anaconda2\envs\catalyst3\lib\site-packages\catalyst\__main__.py", line 609, in ingest_exchange
    csv=csv
  File "c:\users\user\anaconda2\envs\catalyst3\lib\site-packages\catalyst\exchange\exchange_bundle.py", line 824, in inge
st
    show_report=show_report
  File "c:\users\user\anaconda2\envs\catalyst3\lib\site-packages\catalyst\exchange\exchange_bundle.py", line 643, in inge
st_assets
    cleanup=True
  File "c:\users\user\anaconda2\envs\catalyst3\lib\site-packages\catalyst\exchange\exchange_bundle.py", line 420, in inge
st_ctable
    shutil.rmtree(reader._rootdir)
  File "c:\users\user\anaconda2\envs\catalyst3\lib\shutil.py", line 494, in rmtree
    return _rmtree_unsafe(path, onerror)
  File "c:\users\user\anaconda2\envs\catalyst3\lib\shutil.py", line 384, in _rmtree_unsafe
    _rmtree_unsafe(fullname, onerror)
  File "c:\users\user\anaconda2\envs\catalyst3\lib\shutil.py", line 384, in _rmtree_unsafe
    _rmtree_unsafe(fullname, onerror)
  File "c:\users\user\anaconda2\envs\catalyst3\lib\shutil.py", line 393, in _rmtree_unsafe
    onerror(os.rmdir, path, sys.exc_info())
  File "c:\users\user\anaconda2\envs\catalyst3\lib\shutil.py", line 391, in _rmtree_unsafe
    os.rmdir(path)
OSError: [WinError 145] The directory is not empty: 'C:\\Users\\user\\.catalyst\\data\\exchanges\\binance\\temp_bundles\\
binance-minute-dash_btc-2018-06\\81\\08'

and sometimes also this one:

Traceback (most recent call last):
  File "c:\users\user\anaconda2\envs\catalyst3\lib\site-packages\urllib3\connection.py", line 141, in _new_conn
    (self.host, self.port), self.timeout, **extra_kw)
  File "c:\users\user\anaconda2\envs\catalyst3\lib\site-packages\urllib3\util\connection.py", line 83, in create_connecti
on
    raise err
  File "c:\users\user\anaconda2\envs\catalyst3\lib\site-packages\urllib3\util\connection.py", line 73, in create_connecti
on
    sock.connect(sa)
OSError: [WinError 10048] Only one usage of each socket address (protocol/network address/port) is normally permitted

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "c:\users\user\anaconda2\envs\catalyst3\lib\site-packages\urllib3\connectionpool.py", line 601, in urlopen
    chunked=chunked)
  File "c:\users\user\anaconda2\envs\catalyst3\lib\site-packages\urllib3\connectionpool.py", line 346, in _make_request
    self._validate_conn(conn)
  File "c:\users\user\anaconda2\envs\catalyst3\lib\site-packages\urllib3\connectionpool.py", line 850, in _validate_conn
    conn.connect()
  File "c:\users\user\anaconda2\envs\catalyst3\lib\site-packages\urllib3\connection.py", line 284, in connect
    conn = self._new_conn()
  File "c:\users\user\anaconda2\envs\catalyst3\lib\site-packages\urllib3\connection.py", line 150, in _new_conn
    self, "Failed to establish a new connection: %s" % e)
urllib3.exceptions.NewConnectionError: <urllib3.connection.VerifiedHTTPSConnection object at 0x0000000034767D30>: Failed
 to establish a new connection: [WinError 10048] Only one usage of each socket address (protocol/network address/port) i
s normally permitted

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "c:\users\user\anaconda2\envs\catalyst3\lib\site-packages\requests\adapters.py", line 440, in send
    timeout=timeout
  File "c:\users\user\anaconda2\envs\catalyst3\lib\site-packages\urllib3\connectionpool.py", line 639, in urlopen
    _stacktrace=sys.exc_info()[2])
  File "c:\users\user\anaconda2\envs\catalyst3\lib\site-packages\urllib3\util\retry.py", line 388, in increment
    raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='s3.amazonaws.com', port=443): Max retries exceeded with url:
 /enigmaco/catalyst-bundles/exchange-binance/binance-minute-via_btc-2018-07.tar.gz (Caused by NewConnectionError('<urlli
b3.connection.VerifiedHTTPSConnection object at 0x0000000034767D30>: Failed to establish a new connection: [WinError 100
48] Only one usage of each socket address (protocol/network address/port) is normally permitted',))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "c:\users\user\anaconda2\envs\catalyst3\lib\runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "c:\users\user\anaconda2\envs\catalyst3\lib\runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "C:\Users\user\Anaconda2\envs\catalyst3\Scripts\catalyst.exe\__main__.py", line 9, in <module>
  File "c:\users\user\anaconda2\envs\catalyst3\lib\site-packages\click\core.py", line 722, in __call__
    return self.main(*args, **kwargs)
  File "c:\users\user\anaconda2\envs\catalyst3\lib\site-packages\click\core.py", line 697, in main
    rv = self.invoke(ctx)
  File "c:\users\user\anaconda2\envs\catalyst3\lib\site-packages\click\core.py", line 1066, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "c:\users\user\anaconda2\envs\catalyst3\lib\site-packages\click\core.py", line 895, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "c:\users\user\anaconda2\envs\catalyst3\lib\site-packages\click\core.py", line 535, in invoke
    return callback(*args, **kwargs)
  File "c:\users\user\anaconda2\envs\catalyst3\lib\site-packages\click\decorators.py", line 17, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "c:\users\user\anaconda2\envs\catalyst3\lib\site-packages\catalyst\__main__.py", line 609, in ingest_exchange
    csv=csv
  File "c:\users\user\anaconda2\envs\catalyst3\lib\site-packages\catalyst\exchange\exchange_bundle.py", line 824, in inge
st
    show_report=show_report
  File "c:\users\user\anaconda2\envs\catalyst3\lib\site-packages\catalyst\exchange\exchange_bundle.py", line 643, in inge
st_assets
    cleanup=True
  File "c:\users\user\anaconda2\envs\catalyst3\lib\site-packages\catalyst\exchange\exchange_bundle.py", line 365, in inge
st_ctable
    period=period
  File "c:\users\user\anaconda2\envs\catalyst3\lib\site-packages\catalyst\exchange\utils\bundle_utils.py", line 56, in ge
t_bcolz_chunk
    bytes = download_without_progress(url)
  File "c:\users\user\anaconda2\envs\catalyst3\lib\site-packages\catalyst\data\bundles\core.py", line 189, in download_wi
thout_progress
    resp = requests.get(url)
  File "c:\users\user\anaconda2\envs\catalyst3\lib\site-packages\requests\api.py", line 72, in get
    return request('get', url, params=params, **kwargs)
  File "c:\users\user\anaconda2\envs\catalyst3\lib\site-packages\requests\api.py", line 58, in request
    return session.request(method=method, url=url, **kwargs)
  File "c:\users\user\anaconda2\envs\catalyst3\lib\site-packages\requests\sessions.py", line 508, in request
    resp = self.send(prep, **send_kwargs)
  File "c:\users\user\anaconda2\envs\catalyst3\lib\site-packages\requests\sessions.py", line 618, in send
    r = adapter.send(request, **kwargs)
  File "c:\users\user\anaconda2\envs\catalyst3\lib\site-packages\requests\adapters.py", line 508, in send
    raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPSConnectionPool(host='s3.amazonaws.com', port=443): Max retries exceeded with u
rl: /enigmaco/catalyst-bundles/exchange-binance/binance-minute-via_btc-2018-07.tar.gz (Caused by NewConnectionError('<ur
llib3.connection.VerifiedHTTPSConnection object at 0x0000000034767D30>: Failed to establish a new connection: [WinError
10048] Only one usage of each socket address (protocol/network address/port) is normally permitted',))

When such an error appears, I just restart the ingestion, which eventually completes. However, when I try to launch a backtest, Catalyst complains that the data for some pairs is not ingested.

PricingDataNotLoadedError: Missing data for binance omg_btc in date range [2017-10-28 00:00:00+00:00 - 2017-10-31 23:59:00+00:00]
Please run: `catalyst ingest-exchange -x binance -f minute -i omg_btc`. See catalyst documentation for details.

I tried to run the clean-exchange command, and restart the ingestion, with exactly the same result. Note that the same problem appeared when I ingested the Poloniex bundle a while ago. After sufficiently many ingestion attempts, I ended up with all the data. Unfortunatlely this doesn't seem to work for Binance.

Thanks!

Samuel

lenak25 commented 6 years ago

Thanks for reporting @sam31415 . How are you ingesting the minute data? Using a single command for ingesting all the pairs or by using a specific command per each pair?

sam31415 commented 6 years ago

@lenak25 I'm ingesting the whole bundle. When ingesting single pairs, the errors above are less likely to occur, just because the ingestion takes less time, but they do occur as well from time to time.

In the case of the error above, I can run the ingest on omg_btc only and it succeeds without error. But when I backtest, I still get the error saying that the data is not ingested.

lenak25 commented 6 years ago

OK, thanks. Binance data is bigger than other exchanges (there are 370 trading pairs on Binance, comparing to 98 on Poloniex) so that can explain the higher frequency of errors comparing to Poloniex. Thanks for the information, we will investigate this!