lagerfeuer / cryptocompare

Python3 Wrapper for the CryptoCompare API
MIT License
180 stars 75 forks source link

Premature termination when fetching price history #65

Open patrick300100 opened 11 months ago

patrick300100 commented 11 months ago

I think I found a bug that occurs in get_historical_price_hour_from (and I think in get_historical_price_day_from as well, but I only used get_historical_price_hour_from so I will be talking about this function).

So basically if I understand your code correctly, you keep fetching the price history until we are at fromTs, but basically if price history is not available (because we go beyond the time where price history is stored) it returns only zeroes: {'time': xxxxxxx, 'high': 0, 'low': 0, 'open': 0, 'volumefrom': 0, 'volumeto': 0, 'close': 0, 'conversionType': 'direct', 'conversionSymbol': ''} and you remove those from the list in validHist = [elem for elem in p if elem["time"] >= fromTs_i and elem["open"] != 0 and elem["close"] != 0]. So for example I would get this response if I try to get the price history of Ethereum on January 1st 2000. I would get only zeroes because Ethereum didn't exist back then.

And I think you also assumed that. Because a few lines later at if len(validHist) < len(p): break, you check if we removed some zero entries (and elem["time"] >= fromTs_i but that is not important for this story) and if we did, you assume that we reached the end of valid price histories and we break out of the while loop. However, when fetching the price history of Solana (but it applies to other coins as well), I got this:

bug

For whatever reason, the price history sometimes has zero entries in it, even though we haven't reached the end yet! In my example, I was fetching from January 1st 2010 (just a lower bound to indicate that I want all the possible price history available) until July 1st 2023. But it prematurely terminated in this loop, even though there is more price history to fetch!

A possible fix could be to not terminate when we removed at least 1 entry, but to keep going until all 2000 (or to how much limit is set) entries are zero entries. In that case you are absolutely sure that you reached the very beginning of a coin's price history.

lagerfeuer commented 11 months ago

Hi @patrick300100,

First, thank you for digging into this issue and creating this very detailed write-up. Could you please respond with the exact function call (i.e., exact arguments) so I can reproduce it?

I'm guessing that the zero entries may be due to trading halts that have happened with Solana in the past, see this news article, for example. We'd have to go back and look if the dates align. During that period, there may be no price data because no trades were fulfilled.

To be honest with you, I'm not doing much work on this library these days anymore. This was one of the first Python libraries I wrote and there's a lot of room for improvement. If you feel like trying to fix it yourself and opening a PR, I'll take a look. Other than that, I probably can't give you an estimate on when this will be fixed.

patrick300100 commented 11 months ago

Yeah sure! The function call that I used is get_historical_price_hour_from(coin='SOL', currency='EUR', fromTs=datetime(2010, 1, 1), toTs=datetime(2023, 7, 1)).

I looked at the dates whether the Solana network outages described in your news article align with the zero entries. That's not the case. The Unix timestamp 1587780000 (one of the zero entries) corresponds to April 25th 2020. The news article you sent talks about outages at January 2022, December 2021, and September 2021.

It's also not only a thing that happens at Solana, but for multiple coins. Take for example ADA (Cardano) and look at entry 242: 7350216cb5df5da209f5f58613d3b628

Or Axie Infinity (AXS) at entries 415 and 416: fb9ef5990d897812fd8bcec42c2795ee

And for Render (RNDR) it's even messier: ddf9e4aaa1efe45af20fe115431b6cd5

It's not for all coins though! But definitely for some as you can see above. It's not only a Solana-thing.

It could be very much the case that something happened with each coin's network and hence having no price information at times, but actually that doesn't matter. The fact is that sometimes there is no price information and that causes the code to terminate the while-loop too early.

If you're now convinced that this is a bug, I'll open a PR soon enough!

lagerfeuer commented 11 months ago

I do wonder though, have you tried other exchanges and if yes, do the timestamps of zero entries align between them? Maybe the exchange halted trading even though the coin was still online.

patrick300100 commented 11 months ago

I just tried it out and it indeed seems that it is exchange-related. I took AXS and at the CCCAGG exchange it just gives a price 3b108e977abc52a959dcc8474a8cf3a1

But at the Coinbase exchange it doesn't: 3ce9d98a850a39bd5f99d78f81f9221f

lagerfeuer commented 11 months ago

Hm, interesting, it may not be a bug after all. Now I'm thinking it coincides with exchange downtime.

patrick300100 commented 11 months ago

That could definitely be a thing, but even if there was a temporary exchange downtime, do you mean that if I try to get price history from January 1st to January 31st and there is exchange downtime on January 15, that its correct behaviour is to only get price history from January 15th to January 31st even though there is more price history from January 1st to January 14th?

*Just for the ease of explaining I took the downtime of a whole day

lagerfeuer commented 11 months ago

No, you're right that it should still return the whole time frame. I initially assumed that the fact that it includes a zero value by itself is already proof that there's a bug.

To sum up, I think the returned zero values are legit (for better or worse), but the library should definitely return the entire time frame like you described.

I do remember there was some weirdness with limit though, so maybe that's another factor. It's been a while since I touched this library.

patrick300100 commented 11 months ago

Yeah limit also impacts this behaviour. Take for example again the example I described earlier of getting price history between January 1st and January 31st with downtime on January 15th.

If the limit for example is 6 days, it would retrieve data from January 26th until January 31st (26, 27, 28, 29, 30, 31), then it would again get 6 days, in which it would get data from January 20th until January 25th (20, 21, 22, 23, 24, 25). And then it would again get data for 6 days from January 14th until January 19th (14, 15, 16, 17, 18, 19). Here we would break out of the while loop because January 15th would have zero entries because of the downtime, so it would filter those out, your code detects that at if len(validHist) < len(p): break and it would return an array with data from January 14th 'till 31st excluding the 15th.

If we put the limit to 10 days, we would first get data from January 22nd until January 31st. Then we would again get data for 10 days from January 12th until January 21st. Now we would break out of the while loop in this iteration, removing the zero entries that occurred at January 15th, and then return the full array with price history from January 12th until January 31st excluding the 15th.

I don't know whether that is what you meant, but limit does also impact this behaviour. That's why I proposed a fix to not stop terminating when we remove at least 1 zero entry, but keep fetching until all the entries returned by the cryptocompare API are zero entries. I fixed this behaviour locally with just a single line-change from if len(validHist) < len(p): break to if len(validHist) == 0: break. Then we stop if there is no valid history anymore instead of stopping when we removed at least 1 zero entry.

If you don't want to have the library changed because you're not convinced of my issue that is of course also fine :). I understand it can be a bit challenging / hard to dive into your code / topics / concepts from years ago especially if you haven't touched the whole code base in a long time

lagerfeuer commented 11 months ago

No, I think you're absolutely correct on this. Actual unit tests would also help to catch issues like this, but like I said, this was my first library and hasn't been touched in a while. The code in question was also contributed by someone else, which makes it even more difficult to follow.

You're more than welcome to suggest the changes in a PR. I'd love to fix this but I'm not sure when I'll have time.