Open perama-v opened 1 year ago
Is this desirable?
Also, without a complete history of appearances, it's impossible to build a reconcilable accounting (in both the financial sense and the 'state change' sense).
This is exactly why all of us are struggling to do automated accounting "off-chain" against an "on-chain" system that reconciles 240,000,000 addresses every 12 seconds.
definitely supportive, defer to @rakita @joshieDo if this is currently technically possible or if it would require additional indexes (I think it would)
I'm not speaking for the original poster here, but, yes, this would require an additional index and associated hard drive space. That's why it should be optional.
It's also why I proposed (and @perama-v wrote) this feature as well: https://github.com/ethereum/execution-apis/pull/452.
eth_getAddressesInBlock
does not require any additional indexes or hard drive space (it can be done on-demand), but it does go a long way towards enabling address_getAppearances
.
address_getAppearances
, in order to build the index you're asking about, would need to repeatedly call eth_getAddressesInBlock
(or it could be called during syncing).
If eth_getAddressesInBlock
existed even if there was no additional index in the node, at least external indexers could build their own index off-chain.
I'm not suggesting that address_getAppearances
doesn't get added. I think it should because it's so necessary, but a good starting point might be eth_getAddressesInBlock
.
Here are some notes after reviewing existing structures
AccountHistory
comparisonThere is already the AccountHistory
table that stores all transaction identifiers where an address (account) state was changed.
This AccountHistory
is a subset of the data address_getAppearances
method would return. Additional data
includes items such as:
So, a new table (such as AppearancesHistory
) is required to capture this additional data. Either:
AccountHistory
table).
This is duplication, but with compression (see below) it may be less significant.AppearancesHistory
table.
For address_getAppearances
, one then looks in both tables and combines result. One downside is
that the AccountHistory
table stores block numbers, which may be less suitable (slower, more complex) than storing transaction numbers.TxNumber
Fortunately, transactions are represented by a global number TxNumber
rather than index in block.
Relevant tables:
TransactionBlock
to map TxNumber
-> block number. See method.BlockBodyIndices
for block number -> tx index in blockThat means we can have a simple mapping of Address
-> List(TxNumber
)
So, similar compression can be used for an AppearancesHistory
table
AppearanceHistory
tableA new table AppearanceHistory
of a similar nature to AccountHistory
could be made
/// AppearanceHistory (new) table
/// List of TxNumber (global transaction number, not index in block)
pub type TransactionNumberList = IntegerList;
table!(
/// Stores pointers to transaction appearance-set with appearances for each address key.
/// ( AppearanceHistory ) Address | TransactionNumberList
)
If reth stores global tx numbers they can be converted to (block_number, transaction_index)
for the JSON-RPC method address_getAppearances
response.
// Pseudocode for address_getAppearances method response construction.
fn address_get_appearances(address: Address) -> Vec<(BlockNumber, TxIndex)> {
// AppearanceHistory table (new)
let tx_list: TransactionNumberList = appearance_history_table.get(address);
let mut appearances = vec![];
for global_tx_num in tx_list.items() {
// TransactionBlock table
let block_num = transaction_block_table.get(global_tx_num);
// BlockBodyIndices table
let tx_index_in_block = block_body_indices_table.get(global_tx_num);
appearances.push((block_num, tx_index_in_block));
}
appearances
}
IndexAppearanceHistoryState
stageA stage called IndexAppearanceHistoryState
could exist, structurally similar to
IndexAccountHistoryStage
TODO - when is a good time to do this stage?
i'm conservatively supportive. Between this, and otterscan index needs, we probably need better abstractions to accommodate feature/flag based indexes without being too messy
I'm skeptical about the estimate... 80GB seems low, but if it's optional it's whatever
A contract is called as a read-only method.
If i'm not mistaken, this implies having to do "something" during the execution stage as well
I'm skeptical about the estimate... 80GB seems low, but if it's optional it's whatever
The estimate is based on the UnchainedIndex, which has the same data (address-->appearances), but is represented in a custom format suited for distribution by 12 hour block ranges (total size ~80gb, no compression)
If i'm not mistaken, this implies having to do "something" during the execution stage as well
Yes, for each tx certain locations are checked for addresses. So this is additional work that must be performed. The specifics are in the spec in the linked execution apis PR.
The work roughly amounts to a trace with calltracer, and checking fields within that response. Plus a check of addresses in withdrawals and block rewards.
Added a task list for the foreseeable steps.
Regarding the proposed implement IndexAppearanceHistoryState
stage:
eth_getAppearances
for each block and stores result in a tableOne consideration is that it this re-executes the transactions, separately from the Execution stage. One benefit is that
it keeps the Execution stage clean. Another is possible reuse of the CallTracer
for a simple implementation. Instead to avoid re-execution of transactions, it could be brought into Execution stage.
So, a new table (such as
AppearancesHistory
) is required to capture this additional data. Either:
- New table contains all appearances (including state changes covered by
AccountHistory
table). This is duplication, but with compression (see below) it may be less significant.- New table only contains appearances that are not covered by
AppearancesHistory
table. Foraddress_getAppearances
, one then looks in both tables and combines result. One downside is that theAccountHistory
table stores block numbers, which may be less suitable (slower, more complex) than storing transaction numbers.
There's a third option here. Enhance the AccountHistory
table to include all appearances (both state change and non-state changing). That would serve three purposes:
1) It would work identically the same for the current AccountHistory
use case, but need to filter out non-state-changing appearances,
2) It would serve all the OtterScan use cases (the Appearance database is a super-set of the Etherscan-like use case as discussed here: https://tjayrush.medium.com/how-accurate-is-etherscan-83dab12eeedd and here: https://medium.com/coinmonks/trueblocks-covalent-comparison-7b42f3d1e6f7.) The needs of a blockchain explorer are lesser than the needs of an complete appearance indexer,
3) It would cut down on nearly all additional disc space needed to support the feature. Only non-state-changing
appearances would have to be added which are significantly less than the total.
i'm conservatively supportive.
I would encourage you to think very carefully about this. Adding what I call a "complete" index of appearances allows one to produce a "complete" transactional history for any account. This allows for perfect 18-decimal place accurate re-creation of an accounts activity off-chain. This allows perfect, 18-decimal place accurate automated accounting every 12 seconds.
If you ever wondered why no-one can produce automated tax information from a system that comes to balance for 200,000,000 million accounts every 12 seconds, it's because there is no way to get a complete history.
I'm skeptical about the estimate... 80GB seems low, but if it's optional it's whatever
jrush@linux:/m/m/t/v/u/mainnet➤ du /mnt/md0/trueblocks/v0.40.0/unchained/mainnet/
424K /mnt/md0/trueblocks/v0.40.0/unchained/mainnet/unripe
3.9G /mnt/md0/trueblocks/v0.40.0/unchained/mainnet/blooms
95G /mnt/md0/trueblocks/v0.40.0/unchained/mainnet/finalized
4.0K /mnt/md0/trueblocks/v0.40.0/unchained/mainnet/ripe
4.0K /mnt/md0/trueblocks/v0.40.0/unchained/mainnet/maps
73M /mnt/md0/trueblocks/v0.40.0/unchained/mainnet/staging
99G /mnt/md0/trueblocks/v0.40.0/unchained/mainnet/
jrush@linux:/m/m/t/v/u/mainnet➤ du /mnt/md0/trueblocks/v0.40.0/unchained/sepolia
4.0K /mnt/md0/trueblocks/v0.40.0/unchained/sepolia/unripe
5.5M /mnt/md0/trueblocks/v0.40.0/unchained/sepolia/blooms
55M /mnt/md0/trueblocks/v0.40.0/unchained/sepolia/finalized
4.0K /mnt/md0/trueblocks/v0.40.0/unchained/sepolia/ripe
4.0K /mnt/md0/trueblocks/v0.40.0/unchained/sepolia/maps
4.0K /mnt/md0/trueblocks/v0.40.0/unchained/sepolia/staging
74M /mnt/md0/trueblocks/v0.40.0/unchained/sepolia
Also, if it was combined into the AccountHistory
table, it would be that much smaller.
I'd love this
thinking aloud here, does putting this in an address_
namespace make sense, as opposed to some other name?
down the road I would love to see additional advanced querying rpc methods (like getBlockAccessList
/ getTransactionAccessList
). not all of these additional methods will be address-related, but they have a similar enough flavor that they might go in the same namespace
so maybe something like query_blockAddresses
/ query_AddressAppearances
would be more future proof?
I'd love this
:beer:
thinking aloud here, does putting this in an
address_
namespace make sense, as opposed to some other name?
Great and important question. I am open to this, let's discuss over where namespace is defined
yes, am supportive. This sort of capability in the EVM node software seems to have been an oversight in the original EVM spec.
Unclear what storage is required, but as long as it is optional, can be a node-specific index. I would gladly trade ~100 GB for such an index for my nodes.
This
AccountHistory
is a subset of the dataaddress_getAppearances
method would return. Additional data includes items such as:
- A contract is called as a read-only method.
- An address is mentioned in a log without a state change.
A contract read is a superset of a log.
So, a new table (such as
AppearancesHistory
) is required to capture this additional data. Either:
- New table contains all appearances (including state changes covered by
AccountHistory
table). This is duplication, but with compression (see below) it may be less significant.- New table only contains appearances that are not covered by
AppearancesHistory
table. Foraddress_getAppearances
, one then looks in both tables and combines result. One downside is that theAccountHistory
table stores block numbers, which may be less suitable (slower, more complex) than storing transaction numbers.
Would definitely go with the first option, as the second path introduces dependency that increases complexity.
The estimate is based on the UnchainedIndex, which has the same data (address-->appearances), but is represented in a custom format suited for distribution by 12 hour block ranges (total size ~80gb, no compression)
Would double that estimation as we need to save this inside db, but db size is not that relevant for discussion.
There's a third option here. Enhance the
AccountHistory
table to include all appearances (both state change and non-state changing). That would serve three purposes:
AccountHistory has index to the block but it seem for this rpc it is more preferred to go with tx index.
If you ever wondered why no-one can produce automated tax information from a system that comes to balance for 200,000,000 million accounts every 12 seconds, it's because there is no way to get a complete history.
Although this endpoint would include just an Ethereum balance change, it would not have the address/storages of the ERC20 contract storage slots that are, I assume, needed for accounting.
imo appearance is defined very broadly here. And we are talking here just for account appearance (storage is not mentioned).
For example to be on the same page, EXTCODESIZE
opcode would add this address to the appearance list, access_list
defined inside tx would add it, any CALL
opcode would add it and any post transaction state change would add it, etc.
The request is to have a list of transaction numbers (not blocks number that we use in AccountHistory) this is not possible with current tx numbers as we have appearances at the end of the block that would fall in the middle of two transactions (last of the block and first of the next block). When we have tx level changesets we had an additional index transition index to accommodate this.
On the naming (not relevant), I would see this name as reth_getAccountApperances
as clients with not standardized rpc follow this format. etc https://github.com/paradigmxyz/reth/pull/3768
Although this endpoint would include just an Ethereum balance change, it would not have the address/storages of the ERC20 contract storage slots that are, I assume, needed for accounting.
imo appearance is defined very broadly here. And we are talking here just for account appearance (storage is not mentioned). For example to be on the same page,
EXTCODESIZE
opcode would add this address to the appearance list,access_list
defined inside tx would add it, anyCALL
opcode would add it and any post transaction state change would add it, etc.The request is to have a list of transaction numbers (not blocks number that we use in AccountHistory) this is not possible with current tx numbers as we have appearances at the end of the block that would fall in the middle of two transactions (last of the block and first of the next block). When we have tx level changesets we had an additional index transition index to accommodate this.
I totally understand everything you're saying here, and yes, the index as original envisioned would index into <block.txid>
and I understand why that would be hard. Two things:
1) I think the next version of Erigon is going to try to index per transaction in some places so they can report on per-transaction changes in balance...
2) I haven't thought about this much, but a block-level index may be interesting. It would greatly reduce the size of the index, and would slow down the query, but we use local caching, so that only matters the first time a user retrieves a history. I do understand why this would be much easier for existing nodes.
Concerning the issue of state change, you're exactly right. All the existing indexes that I'm aware of in the node ignore the changes to storage where the thing being stored is an address. This is exactly the problem. This is why it's impossible to re-create state changes (including accurate balance histories) off-chain. I would almost go so far as to say that implementing an index that does not capture these types of appearances (where an address is "used" by a smart contract) wouldn't be worth it. You can get all the other "appearances" already.
This "adding all appearances" may be more complicated that you might thing though because a HUGE number of such appearances happen as left-side zero-padded 32-bytes. You have to use some heuristics to identify things that "appear" to be addresses. I've written extensively about this issue on pages 32-41 of this document: https://trueblocks.io/papers/2023/specification-for-the-unchained-index-v0.51.0-beta.pdf
Concerning naming, I think I probably fall down on the side of reth_
as that is the way nodes add "non-standard" names spaces. It's not perfect, but it avoids a lot of complicated negotiation with others and that problem (what namespace) can be solved later.
Would definitely go with the first option, as the second path introduces dependency that increases complexity.
Thanks
imo appearance is defined very broadly here. And we are talking here just for account appearance (storage is not mentioned). For example to be on the same page,
EXTCODESIZE
opcode would add this address to the appearance list,access_list
defined inside tx would add it, anyCALL
opcode would add it and any post transaction state change would add it, etc.
Yes, it is a broad definition to answer the question: "which transactions should I look further into for my address?". While broad in coverage, the Appearances spec (see link at top of issue) aims to have a precise inclusion definition. See also the implementation in #4162. The EXTCODESIZE
opcode and storage activity are not part of the definition.
Although this endpoint would include just an Ethereum balance change, it would not have the address/storages of the ERC20 contract storage slots that are, I assume, needed for accounting.
The main idea is to be agnostic to the end use case. The method returns a list of relevant transaction ids that could be used by the user to start ERC20 accounting, ERC721 accounting, ether accounting, or any other purpose. They would fetch and inspect/re-execute transactions according to their needs, which may or may not involve looking at storage changes.
The request is to have a list of transaction numbers (not blocks number that we use in AccountHistory) this is not possible with current tx numbers as we have appearances at the end of the block that would fall in the middle of two transactions (last of the block and first of the next block). When we have tx level changesets we had an additional index transition index to accommodate this.
I have implemented a possible mechanism to achieve this, using a wrapper that adds bitflags to a TxNum
. E.g., A "miner" recipient in block n
is represented as the TxNum
of the first tx in that block, with a specific flag in the upper bits. So all data can be kept in one table:
The method returns a list of relevant transaction ids that could be used by the user to start ERC20 accounting, ERC721 accounting, ether accounting, or any other purpose.
I wanted to highlight this for two reasons: (1) the definition of appearance should be as broad as possible so that it includes any possible use case-let the end user decide which particulars they are interested in, and (2) almost every time I explain this to someone, I use the idea of accounting, but the "any other purpose" of this may be the most important. With a complete list of appearances, once could re-create any state change whatsoever. Not just changes in balances. Changes in any state. This also argues for as inclusive an index as possible. That's why I focus on what I call "Every appearance of any address anywhere on the chain."
But, note that there has to be the notion of a badress
(that is otherwise valid addresses that are excluded because if one does not exclude them the index ballons in size). Every single 20-byte string of bytes is a valid address, but, as an example, 0x0000000000000000000000000000ffffffffffff
probably isn't one. All pre-compiles are also baddresses
. Part of the very clear definition of an appearance has to exclude baddresses
.
Love the progress on all this - just to say it explicitly, we're happy to iterate on this in the branch, but ideally we can figure out ways to introduce APIs like this w/o necessarily introducing large main codebase maintenance overheads..
We already have utilities for extending the CLI, see here: https://github.com/paradigmxyz/reth/blob/main/examples/additional-rpc-namespace-in-cli/src/main.rs#L52. So it might be a matter of showing how to also run additional stages.
This issue is stale because it has been open for 14 days with no activity.
This issue was closed because it has been inactive for 7 days since being marked as stale.
Happy to see this reopened, just 'cause I'm really looking forward to seeing Reth implement a solution to, in my view, the early failure to spec this into the EVM in 2015.
Describe the feature
This issue is to introduce the concept of "address appearances" and assess appetite for inclusion in reth.
Specifications
See also the following specifications:
Motivation
A user (or local application) running reth can start with an address (e.g.,
0x30a4...1382
) important to them. They query reth and get the set of transactions that are relevant to that address.This kicks off the journey, which can now proceed in many ways:
As addresses are the foundation of user and protocol activity, the feature is general purpose.
Why "in reth" and not separate
The presence of the
address_getAppearances
natively within reth would mean first class support for user-focused applications. This could lead to:address_getAppearances
to locate relevant transactions. The frontend does not require any node restart / trace filter / external application to function.As the method is general purpose it always supports new wallets/protocols that a user is suddenly interested in.
Prior work
The UnchainedIndex, accessible via trueblocks-core (https://github.com/TrueBlocks/trueblocks-core) is an example of the utility of the information that
address_getAppearances
would provide.The method is equivalent to the following command in
trueblocks-core
:Disk impact
The support for
address_getAppearances
creates additional disk burden. An upper ceiling estimate is 80GB.The estimate is based on the UnchainedIndex (80GB), which is specially designed for distribution/sharding on IPFS. Such requirements mean that the index adds immutable pieces of data every 12 hours. An address that appears every 12 hours is therefore duplicated.
There are likely means to significantly reduce the disk footprint of a native reth version of this data.
Additional context
What is an "appearance"
Appearances are defined in a way to capture instances when an address arises in a meaningful way in the chain.
A simple example is the subject of a
CALL
opcode to an EOA. The recipient "appears" in that transaction (they receive ether). Upon tracing, the meaning of the transaction can be determined.An "appearance" is defined in the following PR:
address_getAppearances
methodThe method is defined in the following PR:
This is a chicken an egg situation: The method is stronger if there is an indication of existence/support in a client and vice versa.
Temperature check
Seeking feedback on this concept. Do any of the following resonate with you? Please feel free to add thoughts of any kind
address_
namespace foraddress_getAppearances
be considered?some_application_idea
"<other>
Thanks for your time