Closed avn3r closed 6 years ago
I was able to reproduce this issue. Basically, using Jan 1st as the start date looks for data in the previous period. Using Jan 2nd does not. Working on a fix.
Here is at least part of the issue. Even though your algo starts on 2017-01-01, you are calling data.history()
which a number of bars from the current date. Your bar_count is 10080. This explains why the algo would attempt to retrieve 2016 data. Note, when you don't specify a data_frequency
parameter, it uses the data frequency of your algo, minute
in this case.
Now, that's an explanation of why it fetches data in 2016, not why it fails to obtain it. I'm investigating this further.
Not understanding what you mention of bar_count parameter not being specified. I clearly specified bar_count=lookback
and frequency='1m'
and data_frequency='minute'
where lookback is 7 days of history. So yeah I look at the last 7 days of 2016 to make predictions of 2017-01-01 but it should still try to get 2016 data because I ensure that only pairs that existed prior to 7 days before current date are part of the universe.
However, as you mention the error has to do with retrieving 2016 data. I confirmed all poloniex markets are working if I specify 2017-01-08
as my starting date so all my history data is in the 2017.
"Not understanding what you mention of bar_count parameter not being specified. " - I apologize, I modified my comment after looking more into your code.
given your feedback, saw I was specifying the wrong range for catalyst ingest.
If I manually ingest:
catalyst ingest-exchange -x poloniex -f minute -s 2016-12-01
It now works. So the issue can be narrow down to when data is automatically ingested.
I'm not sure that I fixed this issue yet but I'm making the following change: when request data.history(), I will modify the range to use the end date of the algo. This will ensure that the algo retrieves historical data only once per market.
This should be fixed. Here is what it looks like on my side: https://www.dropbox.com/s/vh6h2digwbxy4h7/issue_47.mp4?dl=0
Feel free to re-open if you are still experiencing issues with release 0.3.4.
Re-opening to give enough time for validation.
Ok I will confirmed once 0.3.4 is out.
It did not pass my validation.
I pmed you with details, but basically, if I specify -s 2017-01-01
during manual ingest of data it works since my start_date=2017-01-08
. However, a new bug occurs if I select start_data=2016-11-01
where it forgets to fetch older data that I have not manually ingest yet. Since it couldn't find the older data it prints all nan values.
I'm not sure if it's the same issue, but I did find something. I came across a use case where it was periodically trying to ingest data for some bars even after running the same algo multiple times.
Here is the error message:
handling bar: 2017-06-01 23:59:00+00:00
got price 0.091554
[2017-11-03 23:49:41.897220] INFO: exchange_bundle: pricing data for [u'eth_btc'] not found in range 2017-05-31 22:40:00+00:00 to 2017-06-01 00:00:00+00:00, updating the bundles.
[====================================] Ingesting minute price data for eth_btc on bitfinex: 100%
Pricing data close for trading pairs eth_btc trading on exchange bitfinex since 2016-03-09 00:00:00+00:00 is unavailable. The bundle data is either out-of-date or has not been loaded yet. Please ingest data using the command `catalyst ingest-exchange -x bitfinex -f minute -i eth_btc`. See catalyst documentation for details.
I dumped the bundle into a csv file and found this"
2017-05-31 23:56:00+00:00 | 0.050334 | 0.050334 | 0.050334 | 0.050334 | 0
-- | -- | -- | -- | -- | --
2017-05-31 23:57:00+00:00 | 0.050334 | 0.050334 | 0.050334 | 0.050334 | 0
2017-05-31 23:58:00+00:00 | 0.050334 | 0.050334 | 0.050334 | 0.050334 | 0
2017-05-31 23:59:00+00:00 | 0.050334 | 0.050334 | 0.050334 | 0.050334 | 0
2017-06-01 00:00:00+00:00 | | | | | 0
2017-06-01 00:01:00+00:00 | 0.09981799 | 0.09981799 | 0.09981799 | 0.09981799 | 2.25752429
2017-06-01 00:02:00+00:00 | 0.09981799 | 0.09981799 | 0.09981799 | 0.09981799 | 0
There is an empty row on 2017-06-01 which seem to explain the error. I'm investing this now to determine the root cause.
I was able to simulate a behavior similar to what's described here by starting an ingest-exchange
job, killing it halfway and then attempting auto-ingestion.
The first one you mentioned is with respect to bitfinex. I got same error but it's not the one I reported. I have only gotten data to load properly on poloniex.
I did kill it the first time, but I made sure to clean it and redownload. I tested it on Poloniex BTC market. I ingested data from 2017-01-01 to 2017-10-16
and start date is 2017-01-08 on script and works as expected. But the changing the start date to 2016-11-01 throws all nan and doesn't try to ingest that data as expected.
I believe that this issue is still reproducible under some auto-ingest conditions. We've prioritize this an working towards a resolution now.
Yes, I still able to reproduce both errors discussed on v0.3.6
1) Bitfinex minute data eth_btc
producing ingestion error. This is error occur even when manually ingesting the data. Therefore Bitfinex is still not working as expected since 0.3.X.
2) Manually ingesting data but requesting even earlier data:
Example:
catalyst clean-exchange -x poloniex
catalyst ingest-exchange -x poloniex -f minute -s 2017-01-01 -e 2017-10-30
Run simple_universe.py
(my script in PR).
Range: start_date=2017-01-08, end_date=2017-10-15
<-- This works
Range: start_date=2016-12-30, end_date=2017-10-15
<-- Gives Error
Error: All nan values imprinted. This data didn't exist and it should have tried and ingest it, but never did.
I just noticed an important detail in your earlier comment:
Bitfinex minute data eth_btc producing ingestion error.
How do you reproduce this particular condition? This command seems to work well for me:
catalyst ingest-exchange -x bitfinex -i eth_btc -f minute
It's possible that the issue was resolved by changes to the bundle related to issue #54.
I'm investigating the other conditions.
I made two adjustments which seem to address NaN issues when auto-ingesting on top of partially available data:
temp_bundles
folder. We now anticipate this condition and replace the bundle when needed.I'm now investigating this condition more closely: "Manually ingesting data but requesting even earlier data".
With respect to error one eth_btc
i just meant the error you reported.
You:
I'm not sure if it's the same issue, but I did find something. I came across a use case where it was periodically trying to ingest data for some bars even after running the same algo multiple times.
Here is the error message:
handling bar: 2017-06-01 23:59:00+00:00
got price 0.091554
[2017-11-03 23:49:41.897220] INFO: exchange_bundle: pricing data for [u'eth_btc'] not found in range 2017-05-31 22:40:00+00:00 to 2017-06-01 00:00:00+00:00, updating the bundles.
[====================================] Ingesting minute price data for eth_btc on bitfinex: 100%
Pricing data close for trading pairs eth_btc trading on exchange bitfinex since 2016-03-09 00:00:00+00:00 is unavailable. The bundle data is either out-of-date or has not been loaded yet. Please ingest data using the command `catalyst ingest-exchange -x bitfinex -f minute -i eth_btc`. See catalyst documentation for details.
I dumped the bundle into a csv file and found this"
2017-05-31 23:56:00+00:00 | 0.050334 | 0.050334 | 0.050334 | 0.050334 | 0
-- | -- | -- | -- | -- | --
2017-05-31 23:57:00+00:00 | 0.050334 | 0.050334 | 0.050334 | 0.050334 | 0
2017-05-31 23:58:00+00:00 | 0.050334 | 0.050334 | 0.050334 | 0.050334 | 0
2017-05-31 23:59:00+00:00 | 0.050334 | 0.050334 | 0.050334 | 0.050334 | 0
2017-06-01 00:00:00+00:00 | | | | | 0
2017-06-01 00:01:00+00:00 | 0.09981799 | 0.09981799 | 0.09981799 | 0.09981799 | 2.25752429
2017-06-01 00:02:00+00:00 | 0.09981799 | 0.09981799 | 0.09981799 | 0.09981799 | 0
There is an empty row on 2017-06-01 which seem to explain the error. I'm investing this now to determine the root cause.
Waiting for #54 and #53 to complete validation.
After even more testing, I still observed instances of NaN entries during auto-ingestion. I believe that it's a caching issue with the writer and reader but it's hard to pinpoint the exact root cause. We may consider disabling auto-ingestion temporally as these issues don't seem to occur when populating the bundles separately.
No worries.
Just make sure to document about manual ingest on the documentation for both Installation and beginner tutorial so people know they first have to ingest and how should they ingest. Current documentation doesn't talk much about manual ingesting besides the parameters available.
...
Seems all error have been fixed with 0.3.8+. Feel free to go ahead and close this issue.
@abnera , closing this issue with regards to your last comment.
Dear Catalyst Maintainers,
Before I tell you about my issue, let me describe my environment:
Environment
Description of Issue
2017-01-01
. If I select2017-09-01
all markets work as expected. I have manually filters pairs that are not in the current change of the backtest to ensure that data for that day in catalyst existed.BTC Market works for both starting dates, but XMR, eth, usdt markets are not working for
2017-01-01
similar to the previous issues in v0.3.2I have only tested in Poloniex but my script can be used with any exchange by changing context.exchange so feel free to use this script as a unit test.
Code
Error
avn3r