status-im / status-go

The Status module that consumes go-ethereum
https://status.im
Mozilla Public License 2.0
727 stars 245 forks source link

Wallet Merge balance history and transfer process #3835

Closed alaibe closed 1 year ago

alaibe commented 1 year ago

Problem

Check if possible to merge balance history and transfer processes

@IvanBelyakoff @stefandunca tagging you in order to help

IvanBelyakoff commented 1 year ago

AFAIK we store balance history for balance chart and for that purpose we care about transfers that change ETH or token balances, store the timestamp, block and balance. @stefandunca implemented this and used balance_history table for that. For transfers we currently use two strategies - OnDemanFetchStrategy (mobile v1, that is obsolete now) - the one that was there since the beginning, and a new one - SequentialFetchStrategy (used on Desktop) - where we load all transfers sequentially from latest block till history start. OnDemandFetchStrategy stores pretty much the same info as balance history fetcher - block ranges and balance in blocks_ranges table (block_dao.go: ProcessBlock). So we might want to use a single table for that. SequentialFetchStrategy does not store block ranges in a similar way, so nothing to unify here in terms of db columns imo, but while looking for ETH transfers, a downloader checks ranges and finds blocks with ETH transfers (transfers/concurrent.go:findBlocksWithEthTransfers) and balances and those are the data, that is the same imo that we store in balance_history, so I believe that would be nice to reuse this data for balance history. As for tokens, downloader also uses getLogs which I believe balance history fetcher uses as well to get the same data, so we should reuse the requests as well.

stefandunca commented 1 year ago

AFAIK we store balance history for balance chart and for that purpose we care about transfers that change ETH or token balances, store the timestamp, block and balance.

You summarized well. In addition, it tries to estimate and fetch regular intervals to have a uniform sampling point. Also samples are snapped between intervals to reuse requests

OnDemandFetchStrategy stores pretty much the same info as balance history fetcher - block ranges and balance in blocks_ranges table

If we do cover the needed granularity by the balance history, for the supported intervals, it sounds like this strategy could be used for balance history also.

As for tokens, downloader also uses getLogs which I believe balance history fetcher uses as well to get the same data, so we should reuse the requests as well.

Balance history only uses HeaderByNumber to find the graph entry block timestamp and BalanceAt for token values. So no logs.

rasom commented 1 year ago

@stefandunca @IvanBelyakoff @alaibe @John-44 To summarise this note once again https://notes.status.im/TSPntaw0SCeV44Z4ENM1sQ?view#Merging-of-balancetransfers-history-algos

  1. We definitely should merge checking for updates of both processes, unless a new transaction is detected by transfers history process there is no need to make any RPC by balance history process. I suppose we can start working on this right away.
  2. After fetching historical transfers we can provide the data in a format of map (per network) of format map[Token][]BalanceChange where BalanceChange{Balance, Timestamp}, which should be sufficient for charts. Again, I think we can start implementing this right away, the question is how useful it will be in case if we want to guarantee ALL. Three cases here
    • balance history being as "deep" as transfers history
      • Pros: no extra RPC for balance history
      • Cons: no guarantee that ALL tab is available right away
    • do not rely on transfers history when it is incomplete and use existing algorithm
      • Pros: the process is independent thus ALL tab is not a problem for "history-heavy" accounts
      • Cons:
        • extra RPC requests
        • might be not as precise as based on transfers (depends on sampling, but anyway)
    • have both, if transfers history is not complete do sampling for the rest of history
      • Pros:
        • some RPC requests are used for both processes
        • ALL tab can be guarantied
      • Might be more complicated
      • Still some extra RPC requests if transfers history is incomplete
alaibe commented 1 year ago

Regarding your first 2 point I agree we can and should start those.

I believe the fact that not everything is available right away should be acknowledge. I would personally vote for not trying to have multiple strategy because both algorithm can take time so at the end we are never sure which one will finish first right?

As you suggested in another thread a better solution would be for me to introduce rate limiting for heavy account. But that is something we can probably do after all first batch of improvement is done