Closed gatapia closed 6 years ago
It should be daily refresh. Thank you for reporting this. We will let you know as soon as possible.
Was about to write with regards to the same problem. I was running backtest on XRP from Bitfinex and it seams that it stopped generating outcomes after 2017-12-04. But if I run the backtesting up to 2017-12-04 it works fine.
Thank you both for reporting this. We consider this a critical bug in the catalyst infrastructure (it's not related to the code repository itself), and I am actively working to troubleshoot it. I believe it's a matter of hours.
Thanks appreciate the fast reaction
I just updated btc_usd on bitfinex and I now have data until 2018-01-28 00:00:00 is this correct (i.e. 3 days old?)
Just did a full clean/refreshed again and now we have data for 2018-01-29 00:00:00. I guess you are still updating your caches? I'll try again in an hour or so and report back
Any progress on this? I seem to not be getting any data after 15.1 as is stated in the above posted status page.
@gatapia How did you get data past 1-15-18? That's all I'm able to pull down. What are you doing to fully clean and refresh, and how do you know what is the latest data that's been pulled down?
From what I understand, Bitfinex suddenly started throttling down rate limits which interferes with the download script. We've been adjusting parameters to work around it but evidently it's not resolved. We'll provide a more coherent update on Monday.
On Sat, Feb 3, 2018 at 10:54 PM brinew27 notifications@github.com wrote:
@gatapia https://github.com/gatapia How did you get data past 1-15-18? That's all I'm able to pull down. What are you doing to fully clean and refresh, and how do you know what is the latest data that's been pulled down?
— You are receiving this because you commented.
Reply to this email directly, view it on GitHub https://github.com/enigmampc/catalyst/issues/197#issuecomment-362878563, or mute the thread https://github.com/notifications/unsubscribe-auth/ABZ-QpvPNzvz5-KCDMUEKGBaGWb_8blsks5tRSnmgaJpZM4RxrlX .
@brinew27 I was able to get more data by querying the ExchangeBundle.get_reader directly after doing a catalyst clean-exchange; catalyst ingest-exchange
. However when running backtests data is only available until 15/jan
Thanks @gatapia.
@fredfortier @lacabra , Thank you both for looking into this. If it's just Bitfinex throttling, why would it be affecting Poloniex as well (where I'm trading). Along with the live trading issue (which Fred has been working hard on), this has been somewhat of a show stopper for me, as I'm using historical data to train a machine learning algorithm to do trades, and now the data I'm training it with is about 3 weeks old, so its performance is degrading pretty rapidly.
Ingesting Binance data is my next one (though not a show-stopped) for ideal functioning, since training my ML on Poloniex's data and trying to apply that to Binance (where ideally I'd be trading due to much lower commission rates) has definitely led to some issues. =)
Apologies for the delay, we have been focusing on the release of the data marketplace (Catalyst 0.5) which has been announced and released today, and I am now looking again into this. My initial tests show that the historical pricing data for all three exchanges is indeed stored correctly on the servers, and catalyst ingests it and retrieves it just fine. A random sample of coins across exchanges supports this observation (I'm running the latest version of Catalyst: 0.5.1 as of this writing).
For some reason the status page at https://www.enigma.co/catalyst/status fails to update the dates, which I am investigating further, but as far as I can see the data is available for ingestion and backtesting.
# Exchange Poloniex
df = data.history('amp_btc', ['open','high','low','close','volume'], bar_count=30, frequency="1d")
print(df)
close high low open volume
2018-01-09 00:00:00+00:00 0.000072 0.000078 0.000066 0.000069 175.797907
2018-01-10 00:00:00+00:00 0.000069 0.000075 0.000063 0.000072 67.346690
2018-01-11 00:00:00+00:00 0.000063 0.000069 0.000060 0.000069 54.261279
2018-01-12 00:00:00+00:00 0.000070 0.000072 0.000062 0.000063 44.731335
2018-01-13 00:00:00+00:00 0.000069 0.000075 0.000068 0.000070 33.340186
2018-01-14 00:00:00+00:00 0.000068 0.000073 0.000065 0.000069 35.553520
2018-01-15 00:00:00+00:00 0.000063 0.000069 0.000061 0.000068 32.042545
2018-01-16 00:00:00+00:00 0.000049 0.000064 0.000047 0.000063 60.988427
2018-01-17 00:00:00+00:00 0.000049 0.000054 0.000042 0.000049 42.038478
2018-01-18 00:00:00+00:00 0.000053 0.000057 0.000048 0.000048 41.736072
2018-01-19 00:00:00+00:00 0.000052 0.000056 0.000051 0.000053 19.262759
2018-01-20 00:00:00+00:00 0.000052 0.000055 0.000050 0.000052 13.913008
2018-01-21 00:00:00+00:00 0.000047 0.000052 0.000046 0.000052 27.261551
2018-01-22 00:00:00+00:00 0.000047 0.000052 0.000046 0.000048 19.676022
2018-01-23 00:00:00+00:00 0.000048 0.000049 0.000046 0.000047 11.005895
2018-01-24 00:00:00+00:00 0.000052 0.000053 0.000046 0.000048 20.873229
2018-01-25 00:00:00+00:00 0.000049 0.000053 0.000048 0.000052 23.807841
2018-01-26 00:00:00+00:00 0.000047 0.000049 0.000046 0.000049 13.002861
2018-01-27 00:00:00+00:00 0.000052 0.000055 0.000046 0.000047 30.126841
2018-01-28 00:00:00+00:00 0.000050 0.000052 0.000048 0.000052 17.266434
2018-01-29 00:00:00+00:00 0.000048 0.000050 0.000048 0.000050 17.920394
2018-01-30 00:00:00+00:00 0.000047 0.000051 0.000047 0.000048 24.585311
2018-01-31 00:00:00+00:00 0.000045 0.000047 0.000044 0.000047 14.981092
2018-02-01 00:00:00+00:00 0.000045 0.000053 0.000044 0.000045 44.027583
2018-02-02 00:00:00+00:00 0.000041 0.000045 0.000038 0.000045 28.419236
2018-02-03 00:00:00+00:00 0.000044 0.000044 0.000039 0.000041 14.287807
2018-02-04 00:00:00+00:00 0.000039 0.000044 0.000037 0.000044 13.139417
2018-02-05 00:00:00+00:00 0.000036 0.000039 0.000033 0.000039 23.214927
2018-02-06 00:00:00+00:00 0.000037 0.000039 0.000036 0.000036 7.809977
2018-02-07 00:00:00+00:00 0.000040 0.000041 0.000037 0.000037 15.566479
# Exchange Bittrex
df = data.history('1st_btc', ['open','high','low','close','volume'], bar_count=30, frequency="1d")
print(df)
close high low open \
2018-01-09 00:00:00+00:00 0.000139 0.000160 0.000130 0.000136
2018-01-10 00:00:00+00:00 0.000116 0.000140 0.000115 0.000139
2018-01-11 00:00:00+00:00 0.000107 0.000123 0.000097 0.000116
2018-01-12 00:00:00+00:00 0.000115 0.000118 0.000099 0.000107
2018-01-13 00:00:00+00:00 0.000125 0.000135 0.000105 0.000115
2018-01-14 00:00:00+00:00 0.000112 0.000127 0.000111 0.000125
2018-01-15 00:00:00+00:00 0.000105 0.000121 0.000105 0.000112
2018-01-16 00:00:00+00:00 0.000082 0.000105 0.000075 0.000105
2018-01-17 00:00:00+00:00 0.000084 0.000088 0.000069 0.000082
2018-01-18 00:00:00+00:00 0.000093 0.000101 0.000082 0.000084
2018-01-19 00:00:00+00:00 0.000091 0.000095 0.000090 0.000091
2018-01-20 00:00:00+00:00 0.000084 0.000095 0.000083 0.000091
2018-01-21 00:00:00+00:00 0.000079 0.000086 0.000075 0.000086
2018-01-22 00:00:00+00:00 0.000076 0.000087 0.000074 0.000079
2018-01-23 00:00:00+00:00 0.000074 0.000083 0.000073 0.000076
2018-01-24 00:00:00+00:00 0.000077 0.000080 0.000074 0.000075
2018-01-25 00:00:00+00:00 0.000078 0.000080 0.000075 0.000076
2018-01-26 00:00:00+00:00 0.000086 0.000087 0.000075 0.000078
2018-01-27 00:00:00+00:00 0.000082 0.000086 0.000079 0.000086
2018-01-28 00:00:00+00:00 0.000093 0.000103 0.000078 0.000081
2018-01-29 00:00:00+00:00 0.000083 0.000093 0.000082 0.000093
2018-01-30 00:00:00+00:00 0.000070 0.000091 0.000065 0.000083
2018-01-31 00:00:00+00:00 0.000070 0.000074 0.000066 0.000070
2018-02-01 00:00:00+00:00 0.000067 0.000074 0.000063 0.000070
2018-02-02 00:00:00+00:00 0.000064 0.000068 0.000057 0.000068
2018-02-03 00:00:00+00:00 0.000065 0.000067 0.000063 0.000064
2018-02-04 00:00:00+00:00 0.000064 0.000068 0.000063 0.000065
2018-02-05 00:00:00+00:00 0.000061 0.000072 0.000057 0.000063
2018-02-06 00:00:00+00:00 0.000059 0.000061 0.000052 0.000060
2018-02-07 00:00:00+00:00 0.000058 0.000061 0.000056 0.000059
# Exchange Bitfinex
df = data.history('avt_btc', ['open','high','low','close','volume'], bar_count=30, frequency="1d")
print(df)
close high low open \
2018-01-09 00:00:00+00:00 0.000402 0.000500 0.000375 0.000382
2018-01-10 00:00:00+00:00 0.000380 0.000663 0.000346 0.000402
2018-01-11 00:00:00+00:00 0.000405 0.000450 0.000318 0.000385
2018-01-12 00:00:00+00:00 0.000399 0.000440 0.000392 0.000404
2018-01-13 00:00:00+00:00 0.000384 0.000435 0.000376 0.000400
2018-01-14 00:00:00+00:00 0.000362 0.000382 0.000339 0.000381
2018-01-15 00:00:00+00:00 0.000339 0.000368 0.000330 0.000362
2018-01-16 00:00:00+00:00 0.000251 0.000335 0.000211 0.000332
2018-01-17 00:00:00+00:00 0.000293 0.000310 0.000231 0.000251
2018-01-18 00:00:00+00:00 0.000266 0.000325 0.000266 0.000310
2018-01-19 00:00:00+00:00 0.000287 0.000292 0.000260 0.000263
2018-01-20 00:00:00+00:00 0.000325 0.000338 0.000268 0.000280
2018-01-21 00:00:00+00:00 0.000285 0.000320 0.000280 0.000315
2018-01-22 00:00:00+00:00 0.000297 0.000338 0.000280 0.000293
2018-01-23 00:00:00+00:00 0.000327 0.000363 0.000291 0.000314
2018-01-24 00:00:00+00:00 0.000329 0.000365 0.000320 0.000329
2018-01-25 00:00:00+00:00 0.000310 0.000333 0.000301 0.000327
2018-01-26 00:00:00+00:00 0.000366 0.000372 0.000300 0.000310
2018-01-27 00:00:00+00:00 0.000388 0.000430 0.000352 0.000372
2018-01-28 00:00:00+00:00 0.000348 0.000390 0.000339 0.000385
2018-01-29 00:00:00+00:00 0.000350 0.000359 0.000326 0.000350
2018-01-30 00:00:00+00:00 0.000305 0.000352 0.000290 0.000350
2018-01-31 00:00:00+00:00 0.000298 0.000316 0.000291 0.000305
2018-02-01 00:00:00+00:00 0.000295 0.000306 0.000283 0.000300
2018-02-02 00:00:00+00:00 0.000279 0.000301 0.000270 0.000297
2018-02-03 00:00:00+00:00 0.000329 0.000330 0.000275 0.000279
2018-02-04 00:00:00+00:00 0.000311 0.000340 0.000303 0.000330
2018-02-05 00:00:00+00:00 0.000298 0.000329 0.000280 0.000322
2018-02-06 00:00:00+00:00 0.000304 0.000339 0.000280 0.000298
2018-02-07 00:00:00+00:00 0.000294 0.000334 0.000287 0.000307
Found the culprit, and fixed it. As far as I am seeing right now, there was a small error in the scripts that pulled the end_dates from the server, but the data had been on the server all this time (or most of it). Please let me know if anyone is experiencing issues with this, otherwise I'll close this issue.
@lacabra I can query some data past the 15th now, but depending on the date range, I still get errors such as:
NoDataAvailableOnExchange: Requested data for trading pair [u'eth_usd'] is not available on exchange ['bitfinex'] in
minute
frequency at this time. Checkhttp://enigma.co/catalyst/status
for market coverage.
@zackgow please give me example date ranges where you encounter such error so that I can track it down
@lacabra start_date='2018-01-29', end_date='2018-01-30' throws the error for me. I am upgraded to 0.5.1.
Thanks @zackgow, I confirm that I can replicate your error, and that there was indeed some missing data in that bundle. That particular one bundle has been fixed. I'm looking into other minute bundles for that same month Jan 2018. In order for the new data to come in you may need to run again catalyst ingest-exchange -x bitfinex -f minute -i eth_usd
, and if that doesn't fix it, then run first catalyst clean-exchange -x bitfinex
and then then re-ingest.
I uncovered another error, as something got out-of-sync between the catalyst client, and the server that generates the bundles. It has been fixed, and I confirm that data for that particular bundle is now available, and I'm regenerating the data for all other bundles that are missing the last two days of January. I will post an update when it's complete. Here's the code that validates that data is available for that particular bundle:
df = data.history('eth_btc', ['open','high','low','close','volume'], bar_count=10, frequency="1m")
print(df)
close high low open volume
2018-01-29 23:51:00+00:00 0.10448 0.10448 0.10446 0.10447 5.894544
2018-01-29 23:52:00+00:00 0.10447 0.10449 0.10447 0.10449 4.820322
2018-01-29 23:53:00+00:00 0.10439 0.10447 0.10438 0.10447 8.559114
2018-01-29 23:54:00+00:00 0.10441 0.10444 0.10431 0.10444 7.389059
2018-01-29 23:55:00+00:00 0.10434 0.10446 0.10430 0.10446 17.463309
2018-01-29 23:56:00+00:00 0.10444 0.10444 0.10431 0.10434 31.146378
2018-01-29 23:57:00+00:00 0.10444 0.10448 0.10444 0.10448 1.687714
2018-01-29 23:58:00+00:00 0.10443 0.10444 0.10443 0.10444 4.050000
2018-01-29 23:59:00+00:00 0.10460 0.10465 0.10446 0.10446 38.051553
2018-01-30 00:00:00+00:00 0.10439 0.10469 0.10439 0.10460 38.968829
@lacabra I can confirm I am getting missing data errors for this period (Begining Jan 29th) for Poloniex minute, eth_usdt as well, I'll wait for an update prior to testing again to confirm.
EDIT: To report back, I am no longer getting errors when trying to access data from Jan 16th through Feb 7th, but all the data is essentially frozen, with 0 volume and the price never changing from 1245 (eth_usdt minute data on Poloniex). Also oddly, the data seems to stop at Feb 7th at 23:45, instead of midnight, though that's a minor issue.
@zackgow, @brinew27 All the minute bundles (bitfinex and poloniex) for the month of January have been regenerated. My random sampling of bundles indicates that the issue has been resolved. You will need to clear all locally-stored bundles with catalyst clean-exchange -x poloniex
and then re-ingest the data that you need. If you continue experiencing problems, please detail which exchange, currency pair, and interval so that I can dig into it again.
Random sampling yields:
# exchange Poloniex
df = data.history('eth_usdt', ['open','high','low','close','volume'], bar_count=10, frequency="1m")
print(df)
close high low open \
2018-01-29 23:51:00+00:00 1177.839733 1177.839733 1174.164000 1175.064042
2018-01-29 23:52:00+00:00 1174.164001 1174.164001 1174.164001 1174.164001
2018-01-29 23:53:00+00:00 1174.164001 1174.164001 1174.164001 1174.164001
2018-01-29 23:54:00+00:00 1175.000000 1177.014043 1175.000000 1175.000000
2018-01-29 23:55:00+00:00 1177.014043 1177.014043 1177.014043 1177.014043
2018-01-29 23:56:00+00:00 1177.000001 1177.000001 1177.000000 1177.000000
2018-01-29 23:57:00+00:00 1174.914043 1177.000000 1174.914043 1177.000000
2018-01-29 23:58:00+00:00 1177.839733 1177.839733 1177.839733 1177.839733
2018-01-29 23:59:00+00:00 1177.839733 1177.839733 1177.839733 1177.839733
2018-01-30 00:00:00+00:00 1177.948026 1177.948026 1177.839733 1177.839733
volume
2018-01-29 23:51:00+00:00 376.050737
2018-01-29 23:52:00+00:00 1016.055586
2018-01-29 23:53:00+00:00 0.000000
2018-01-29 23:54:00+00:00 1326.173051
2018-01-29 23:55:00+00:00 40.927132
2018-01-29 23:56:00+00:00 283.607507
2018-01-29 23:57:00+00:00 9341.204346
2018-01-29 23:58:00+00:00 626.995668
2018-01-29 23:59:00+00:00 0.000000
2018-01-30 00:00:00+00:00 5416.359877
# exchange Bitfinex
df = data.history('eos_btc', ['open','high','low','close','volume'], bar_count=10, frequency="1m")
print(df)
close high low open volume
2018-01-29 23:51:00+00:00 0.001204 0.001204 0.001204 0.001204 0.000000
2018-01-29 23:52:00+00:00 0.001204 0.001204 0.001204 0.001204 38.498501
2018-01-29 23:53:00+00:00 0.001204 0.001204 0.001204 0.001204 20.085249
2018-01-29 23:54:00+00:00 0.001203 0.001204 0.001203 0.001204 13.141708
2018-01-29 23:55:00+00:00 0.001204 0.001204 0.001204 0.001204 221.953474
2018-01-29 23:56:00+00:00 0.001205 0.001206 0.001204 0.001204 272.401738
2018-01-29 23:57:00+00:00 0.001206 0.001206 0.001206 0.001206 9.630859
2018-01-29 23:58:00+00:00 0.001206 0.001206 0.001206 0.001206 7.637040
2018-01-29 23:59:00+00:00 0.001206 0.001208 0.001206 0.001207 392.917572
2018-01-30 00:00:00+00:00 0.001206 0.001206 0.001206 0.001206 0.000000
@lacabra Thanks Victor! It does seem like it generated data for the missing days. I'm getting a weird error however at a certain part of the data for eth_usdt minute on poloniex it looks like on 12-07-2017. I was able to export 1-15-18 through 1-31-18 without issue. For right now I'll just skip this day, but wanted to report it in case there's an issue with the data at that point (and ideally I'd like my training data to extend back to there.
[2018-02-09 07:53:37.884000] INFO: symbol_export: 2017-12-07 20:56:00+00:00
Traceback (most recent call last):
File "Z:/Users/Brian/Google Drive/Catalyst/symbol_export.py", line 185, in <module>
capital_base=100
File "C:\Users\brian\Anaconda2\envs\catalyst\lib\site-packages\catalyst\utils\run_algo.py", line 551, in run_algorithm
stats_output=stats_output
File "C:\Users\brian\Anaconda2\envs\catalyst\lib\site-packages\catalyst\utils\run_algo.py", line 330, in _run
overwrite_sim_params=False,
File "C:\Users\brian\Anaconda2\envs\catalyst\lib\site-packages\catalyst\exchange\exchange_algorithm.py", line 352, in run
data, overwrite_sim_params
File "C:\Users\brian\Anaconda2\envs\catalyst\lib\site-packages\catalyst\exchange\exchange_algorithm.py", line 309, in run
data, overwrite_sim_params
File "C:\Users\brian\Anaconda2\envs\catalyst\lib\site-packages\catalyst\algorithm.py", line 724, in run
for perf in self.get_generator():
File "C:\Users\brian\Anaconda2\envs\catalyst\lib\site-packages\catalyst\gens\tradesimulation.py", line 243, in transform
self._get_minute_message(dt, algo, algo.perf_tracker)
File "C:\Users\brian\Anaconda2\envs\catalyst\lib\site-packages\catalyst\gens\tradesimulation.py", line 303, in _get_minute_message
dt, self.data_portal,
File "C:\Users\brian\Anaconda2\envs\catalyst\lib\site-packages\catalyst\finance\performance\tracker.py", line 357, in handle_minute_close
account.leverage)
File "C:\Users\brian\Anaconda2\envs\catalyst\lib\site-packages\catalyst\finance\risk\cumulative.py", line 219, in update
self.mean_benchmark_returns_cont[dt_loc] * 252
RuntimeWarning: overflow encountered in double_scalars
Hi @brinew27 glad to hear that the data integrity issue is resolved. What you describe above seems to be a different error, for which I will open a different issue and assign it to someone else, so that we can track it properly. I have checked the data from that day (exported to a csv and plotted), and looks alright to me, other than the fact that there is zero volume for some minutes surrounded by very high or normal activity, but it may be glitches on the exchange: https://docs.google.com/spreadsheets/d/1GYWKoJHBv9W56pdKmf8eXWkOG6WaBvDsg2ZOWWvv2SY/edit?usp=sharing
Hello @lacabra,
I am having similar issues where some of the historical data after Jan. 16 is seemingly frozen with 0 volume. In particular I have seen this happening with data from poloniex on the pairs LTC_BTC, XRP_BTC, STR_BTC, SC_BTC, XMR_BTC during Jan. 28 - 29. I have tried running catalyst clean-exchange -x poloniex
and reingesting the data. Any help would be appreciated, thanks.
@lacabra Hey Victor, I seem to still be having some missing data issues on some pairs on Poloniex. I've cleaned all bundles and re-ingested, and am getting missing data errors from 1-29 to 1-30 for xrp_usdt on Poloniex:
[2018-02-10 07:08:11.313000] INFO: symbol_export: 2018-01-29 23:45:00+00:00
[2018-02-10 07:08:11.339000] INFO: symbol_export: 2018-01-29 23:46:00+00:00
Traceback (most recent call last):
File "Z:/Users/Brian/Google Drive/Catalyst/symbol_export.py", line 185, in <module>
capital_base=100
File "C:\Users\brian\Anaconda2\envs\catalyst\lib\site-packages\catalyst\utils\run_algo.py", line 551, in run_algorithm
stats_output=stats_output
File "C:\Users\brian\Anaconda2\envs\catalyst\lib\site-packages\catalyst\utils\run_algo.py", line 330, in _run
overwrite_sim_params=False,
File "C:\Users\brian\Anaconda2\envs\catalyst\lib\site-packages\catalyst\exchange\exchange_algorithm.py", line 352, in run
data, overwrite_sim_params
File "C:\Users\brian\Anaconda2\envs\catalyst\lib\site-packages\catalyst\exchange\exchange_algorithm.py", line 309, in run
data, overwrite_sim_params
File "C:\Users\brian\Anaconda2\envs\catalyst\lib\site-packages\catalyst\algorithm.py", line 724, in run
for perf in self.get_generator():
File "C:\Users\brian\Anaconda2\envs\catalyst\lib\site-packages\catalyst\gens\tradesimulation.py", line 224, in transform
for capital_change_packet in every_bar(dt):
File "C:\Users\brian\Anaconda2\envs\catalyst\lib\site-packages\catalyst\gens\tradesimulation.py", line 137, in every_bar
handle_data(algo, current_data, dt_to_use)
File "C:\Users\brian\Anaconda2\envs\catalyst\lib\site-packages\catalyst\utils\events.py", line 216, in handle_data
dt,
File "C:\Users\brian\Anaconda2\envs\catalyst\lib\site-packages\catalyst\utils\events.py", line 235, in handle_data
self.callback(context, data)
File "C:\Users\brian\Anaconda2\envs\catalyst\lib\site-packages\catalyst\exchange\exchange_algorithm.py", line 330, in handle_data
super(ExchangeTradingAlgorithmBacktest, self).handle_data(data)
File "C:\Users\brian\Anaconda2\envs\catalyst\lib\site-packages\catalyst\algorithm.py", line 473, in handle_data
self._handle_data(self, data)
File "Z:/Users/Brian/Google Drive/Catalyst/symbol_export.py", line 84, in handle_data
frequency=context.CANDLE_SIZE
File "catalyst\_protocol.pyx", line 120, in catalyst._protocol.check_parameters.__call__.assert_keywords_and_call
File "catalyst\_protocol.pyx", line 679, in catalyst._protocol.BarData.history
File "C:\Users\brian\Anaconda2\envs\catalyst\lib\site-packages\catalyst\exchange\exchange_data_portal.py", line 95, in get_history_window
ffill))
File "C:\Users\brian\Anaconda2\envs\catalyst\lib\site-packages\redo\__init__.py", line 162, in retry
return action(*args, **kwargs)
File "C:\Users\brian\Anaconda2\envs\catalyst\lib\site-packages\catalyst\exchange\exchange_data_portal.py", line 69, in _get_history_window
ffill)
File "C:\Users\brian\Anaconda2\envs\catalyst\lib\site-packages\catalyst\exchange\exchange_data_portal.py", line 313, in get_exchange_history_window
trailing_bar_count=trailing_bar_count,
File "C:\Users\brian\Anaconda2\envs\catalyst\lib\site-packages\catalyst\exchange\exchange_bundle.py", line 901, in get_history_window_series_and_load
trailing_bar_count=trailing_bar_count,
File "C:\Users\brian\Anaconda2\envs\catalyst\lib\site-packages\catalyst\exchange\exchange_bundle.py", line 1014, in get_history_window_series
end_dt=end_dt
catalyst.exchange.exchange_errors.PricingDataNotLoadedError: Missing data for poloniex xrp_usdt in date range [2018-01-29 11:31:00+00:00 - 2018-01-30 00:00:00+00:00]
Please run: `catalyst ingest-exchange -x poloniex -f minute -i xrp_usdt`. See catalyst documentation for details.
@goolulusaurs, @brinew27 yes I confirm that what you are experiencing is all related. I will look into it later today or tomorrow. I have re-opened this issue.
moving from #190:
I have done a clean-exchange and fresh ingest (with no errors) and I have missing data for ltc_usdt pair on poloniex.
I checked the data ingested (dumped to csv) and there is data missing from the 19th of Jan to the 1st of Feb for ltc_usdt. I have not checked other coins.
There seems to still be missing data in Bitfinex. I tried the btc_usd
pair from 2/04/18 onwards.
I am working on this issue today, and I confirm that for example, the pair that @brinew27 mentioned is missing data not only on 1/29-1/30, but is flat after 1/19 as @gatapia and @goolulusaurs mention:
The pricing data is indeed on the server, and once the bundle is re-generated, it has the correct data:
I'm digging into it to understand why this happened and redoing these bundles. Will update soon.
@zackgow issue seems different, will look into it next.
@brinew27, @gatapia, @goolulusaurs, The historical pricing data for Poloniex over the month of January has been fixed. There were many markets that were flat after Jan 16 or Jan 19, and now hold the correct pricing data. Here's a snapshot of the closing prices for all 99 markets on Poloniex over the month of January (previously you could see many flat lines, not anymore):
@lacabra thanks for the work on this but there is much more missing data in the poloniex data than just January. I created a data validation function:
def validate_date_index_integrity(date_index, start='2017-01-01', min_missing_hours=6):
hourlies = pd.date_range(start, date_index[-1].floor('1H'), freq='1H')
missing = hourlies[~hourlies.isin(date_index)]
ranges_missing = []
current_start, current_dt = missing[0], missing[0]
for d in missing:
exp = current_dt + timedelta(hours=1)
if d > exp:
if (current_dt - current_start).total_seconds() // 3600 >= min_missing_hours:
ranges_missing.append((current_start, current_dt))
current_start = d
current_dt = d
if (len(ranges_missing) > 0):
raise Exception('found %d missing date ranges greater than %d hours' % (len(ranges_missing), min_missing_hours))
If I run this on the date index of the ltc_usdt minute data from poloniex (starting from 2017-01-01) I get:
found 38 missing date ranges greater than 6 hours
(where 1/19 - 1/2 was just one of the missing ranges).
After your fix - running a clean-exchange / ingest I get (for the same pair, from 2017-01-01):
found 37 missing date ranges greater than 6 hours
So only one missing date range was fixed. The other remaining missing date ranges from 2017-01-01 are:
(Timestamp('2017-01-01 00:00:00'), Timestamp('2017-01-01 10:00:00'))
(Timestamp('2017-01-02 00:00:00'), Timestamp('2017-01-02 12:00:00'))
(Timestamp('2017-01-03 03:00:00'), Timestamp('2017-01-03 10:00:00'))
(Timestamp('2017-01-03 19:00:00'), Timestamp('2017-01-04 03:00:00'))
(Timestamp('2017-01-06 17:00:00'), Timestamp('2017-01-06 23:00:00'))
(Timestamp('2017-01-07 01:00:00'), Timestamp('2017-01-07 08:00:00'))
(Timestamp('2017-01-07 14:00:00'), Timestamp('2017-01-07 21:00:00'))
(Timestamp('2017-01-08 13:00:00'), Timestamp('2017-01-08 21:00:00'))
(Timestamp('2017-01-08 23:00:00'), Timestamp('2017-01-09 16:00:00'))
(Timestamp('2017-01-10 02:00:00'), Timestamp('2017-01-10 17:00:00'))
(Timestamp('2017-01-10 22:00:00'), Timestamp('2017-01-11 07:00:00'))
(Timestamp('2017-01-14 21:00:00'), Timestamp('2017-01-15 03:00:00'))
(Timestamp('2017-01-19 00:00:00'), Timestamp('2017-01-19 07:00:00'))
(Timestamp('2017-01-21 14:00:00'), Timestamp('2017-01-21 21:00:00'))
(Timestamp('2017-01-22 13:00:00'), Timestamp('2017-01-22 19:00:00'))
(Timestamp('2017-01-24 03:00:00'), Timestamp('2017-01-24 10:00:00'))
(Timestamp('2017-01-25 12:00:00'), Timestamp('2017-01-26 04:00:00'))
(Timestamp('2017-01-27 15:00:00'), Timestamp('2017-01-28 06:00:00'))
(Timestamp('2017-02-03 17:00:00'), Timestamp('2017-02-04 03:00:00'))
(Timestamp('2017-02-04 05:00:00'), Timestamp('2017-02-04 16:00:00'))
(Timestamp('2017-02-04 19:00:00'), Timestamp('2017-02-05 02:00:00'))
(Timestamp('2017-02-05 23:00:00'), Timestamp('2017-02-06 05:00:00'))
(Timestamp('2017-02-06 07:00:00'), Timestamp('2017-02-06 16:00:00'))
(Timestamp('2017-03-01 20:00:00'), Timestamp('2017-03-02 02:00:00'))
(Timestamp('2017-03-06 14:00:00'), Timestamp('2017-03-06 23:00:00'))
(Timestamp('2017-03-08 17:00:00'), Timestamp('2017-03-09 08:00:00'))
(Timestamp('2017-03-10 00:00:00'), Timestamp('2017-03-10 09:00:00'))
(Timestamp('2017-03-13 04:00:00'), Timestamp('2017-03-13 10:00:00'))
(Timestamp('2017-03-15 03:00:00'), Timestamp('2017-03-15 10:00:00'))
(Timestamp('2017-03-19 03:00:00'), Timestamp('2017-03-19 09:00:00'))
(Timestamp('2017-03-20 11:00:00'), Timestamp('2017-03-20 18:00:00'))
(Timestamp('2017-03-20 21:00:00'), Timestamp('2017-03-21 03:00:00'))
(Timestamp('2017-03-22 06:00:00'), Timestamp('2017-03-22 13:00:00'))
(Timestamp('2017-03-23 01:00:00'), Timestamp('2017-03-23 08:00:00'))
(Timestamp('2017-03-27 21:00:00'), Timestamp('2017-03-28 03:00:00'))
(Timestamp('2017-05-02 19:00:00'), Timestamp('2017-05-03 01:00:00'))
(Timestamp('2017-10-22 03:00:00'), Timestamp('2017-10-22 09:00:00'))
Thanks @gatapia for uncovering this.
The issue of missing data in January was all due to the same problem that has since been fixed. The cases for missing data that you report in your last comment are likely due to other factors. I need to clearly identify the cause of each problem, and ensure that is addressed properly to avoid the same happening in the future. I reported the fixing of January data for those interested in testing against the most recent timeframe.
I'm digging into the ones that you have uncovered next.
@gatapia I have run your script, and I don't get any missing dates for the year 2017 for ltc_usdt
.
I have manually checked the last two ranges mentioned in your last post (Timestamp('2017-10-22 03:00:00'), Timestamp('2017-10-22 09:00:00')
and they contain data continuously, this is a sample from the first one (just to confirm, this is from Poloniex
for ltc_usdt, displaying
closeand
volume` columns):
2017-10-22 03:00:00+00:00 57.711907 0.000000
2017-10-22 03:01:00+00:00 57.769721 958.546083
2017-10-22 03:02:00+00:00 57.769721 0.000000
2017-10-22 03:03:00+00:00 57.769721 0.000000
2017-10-22 03:04:00+00:00 57.769721 0.000000
2017-10-22 03:05:00+00:00 57.771000 26.070034
2017-10-22 03:06:00+00:00 57.771000 0.000000
2017-10-22 03:07:00+00:00 57.771000 0.000000
2017-10-22 03:08:00+00:00 57.771000 0.000000
2017-10-22 03:09:00+00:00 57.771000 0.000000
2017-10-22 03:10:00+00:00 57.800000 8356.435000
2017-10-22 03:11:00+00:00 57.800000 20.233901
2017-10-22 03:12:00+00:00 57.800000 10.404000
2017-10-22 03:13:00+00:00 57.800000 6.936000
2017-10-22 03:14:00+00:00 57.800000 5507.568357
2017-10-22 03:15:00+00:00 57.770000 60.786807
2017-10-22 03:16:00+00:00 57.712000 17.325046
2017-10-22 03:17:00+00:00 57.766907 1762.941079
2017-10-22 03:18:00+00:00 57.977455 86.250915
2017-10-22 03:19:00+00:00 57.822604 49.226639
2017-10-22 03:20:00+00:00 57.977455 0.067254
2017-10-22 03:21:00+00:00 57.883119 279.506006
2017-10-22 03:22:00+00:00 57.881030 289.405150
2017-10-22 03:23:00+00:00 57.881030 242.504358
2017-10-22 03:24:00+00:00 57.900000 5790.000000
2017-10-22 03:25:00+00:00 57.862517 602.112521
2017-10-22 03:26:00+00:00 57.862517 0.000000
2017-10-22 03:27:00+00:00 57.747736 2922.489776
2017-10-22 03:28:00+00:00 57.747736 0.000000
2017-10-22 03:29:00+00:00 57.752155 7744.853209
2017-10-22 03:30:00+00:00 57.752155 0.000000
If we resample the data in the bundle above in 5min intervals and compare it with what we can fetch from Poloniex directly (https://poloniex.com/public?command=returnChartData¤cyPair=USDT_LTC&start=1508641200&end=1508662800&period=300), they both match:
close volume
date
2017-10-22 03:00:00 57.769721 958.546083
2017-10-22 03:05:00 57.771000 26.070034
2017-10-22 03:10:00 57.800000 13901.577258
2017-10-22 03:15:00 57.822604 1976.530486
2017-10-22 03:20:00 57.900000 6601.482769
2017-10-22 03:25:00 57.747736 11269.455506
2017-10-22 03:30:00 57.768003 183.734865
I would be curious to know what you pass as the date_index
parameter to your function to see if we find the difference. The one thing that I notice is that the volume
for those exact times is 0, but that is not necessarily an error, that only means that no trades happened on that precise minute.
Interesting, I am doing a:
df.dropna(axis=0, how='any')
to remove any rows with any NaNs.
I can remove this and it will fix my problem but should these rows return NaN for open/close, high/low? I mean, even with 0 volume the open/close/high/low are all the same number for that period (the closing price of the previous period)?
Anyways, this can be closed as I can myself fix this in my code. Leave the decision to you.
Thanks heaps for helping me track this down.
@zackgow thanks for reporting! You help me uncovered another edge case in which some of the lastest bundled pricing was missing yesterday's data. Now it's been fixed moving forward, making the historical pricing data more robust 👍 And btc_usd
on Bitfinex
is up to date as seen on the top left corner below (Feb 2018 data until yesterday). The rest is a random sample of markets available on Bitfinex, which all have valid data. If you continue experience missing data, please run catalyst clean-exchange -x bitfinex
one more time, and ingest again. It should be fixed!
Happy backtesting / trading!
@gatapia quick follow up: the df.dropna(axis=0, how='any')
should have no effect on that data.
Answering your question: I can remove this and it will fix my problem but should these rows return NaN for open/close, high/low? I mean, even with 0 volume the open/close/high/low are all the same number for that period (the closing price of the previous period)?
The answer is yes: rows with zero volume will carry forward the last close for open, high, low, close
, so there should not be any NaNs there.
See the sample minimal code below:
import pandas as pd
from catalyst import run_algorithm
from catalyst.api import symbol
def initialize(context):
context.asset = symbol('ltc_usdt')
def handle_data(context, data):
df = data.history(context.asset, ['open','high','low','close','volume'], bar_count=1440, frequency="1m")
dx = df.dropna(axis=0, how='any')
print(dx.equals(df))
exit(0)
def analyze(context=None, results=None):
pass
if __name__ == '__main__':
run_algorithm(
capital_base=1000,
data_frequency='minute',
initialize=initialize,
handle_data=handle_data,
analyze=analyze,
exchange_name='poloniex',
algo_namespace='testing-datasets',
base_currency='usdt',
live=False,
start=pd.to_datetime('2017-10-23', utc=True),
end=pd.to_datetime('2018-10-23', utc=True),
)
It runs on 2017-10-23 00:00:00
Looks back 1440 minutes (that's 24h over the entire 2017-10-22 day), and fetches all available columns.
Then uses your dropna
function, storing the result in a separate DataFrame
Compares both dataframes, and returns True
meaning both dataframes are exactly the same, meaning that dropna
did not drop any rows.
I wonder where you get your dataframe from, or whether you do other manipulations beforehand?
I feel I have carefully addressed each and every data-missing
issue reported on this thread, and I am therefore closing this issue. I acknowledge that there are a few instances (mostly dating from 2015 and early 2016) in which the exchange has no data, and thus there may still be flat lines (but it that's what's on the exchange, then Catalyst data is as valid as the exchange's). I don't recall observing any of that for 2017 and 2018. Please prove me wrong, and I will gladly dig deeper.
And re-open the issue, or open a new one, if you experience any inconsistencies with historical data in backtesting.
Cheers
I am encountering this error when running a backtest with a fresh ingestion of daily data from bitfinex. Specifically ingesting btc_eur
does not solve the problem.
catalyst.exchange.exchange_errors.PricingDataNotLoadedError: Missing data for bitfinex btc_eur in date range [2017-05-19 00:00:00+00:00 - 2017-07-01 00:00:00+00:00]
Please run: `catalyst ingest-exchange -x bitfinex -f daily -i btc_eur`. See catalyst documentation for details.
I am working on a nightly 'refresh' process to update data, re-train models, re-run backtests, etc. But in doing this I realized that data for the last 15 days or so is not available. I checked the status page and it appears data is only available until the 15th of Jan.
Is there any information anywhere on how this will work? Has data on the catalyst servers ceased being updated? Is it a periodic refresh? If so how often can we expect data to be updated? Is this the move towards the enigma platform, if so how do we access that data source?
Any info appreciated (sorry for the million questions, was just a bit of a surprise).