frol commented 3 years ago

I sum up non-staking and staking-balances based on the latest state changes per account:

SELECT
    SUM(affected_account_nonstaked_balance),
    SUM(affected_account_staked_balance)
FROM (
    SELECT DISTINCT ON (affected_account_id) affected_account_nonstaked_balance, affected_account_staked_balance
    FROM account_changes
    ORDER BY affected_account_id, changed_in_block_timestamp DESC
) as t;

And here are the values:

                sum                |                sum
-----------------------------------+-----------------------------------
 617515305435595972472724261340813 | 402992430818950819988234479163231
(1 row)

The problem here is that this exceeds the total supply. Note, there are three accounts created in genesis (bo.near, registrar, wallet.pulse.near), which have never been touched, so they are not in the account_changes table (they hold just 60 NEAR)

>>> total_supply
1019905212545170492542259392171734

>>> 617515305435595972472724261340813 + 402992430818950819988234479163231
1020507736254546792460958740504044

>>> 617515305435595972472724261340813 + 402992430818950819988234479163231 + (50000000000000000000000000 + 10000000000000000000000000 + 999899913398562500000000)
1020507797254446705859521240504044

The difference is 602.5k NEAR

Any ideas? @telezhnaya @bowenwang1996

bowenwang1996 commented 3 years ago

@frol some questions:

Is this computation done on the same block for which you queried total supply?
Is it possible that account change implementation is incorrect? For example, are burnt balances accounted for?

frol commented 3 years ago

@bowenwang1996

Is this computation done on the same block for which you queried total supply?

Total supply did not change while the query was computed, I kept querying the block JSON RPC endpoint.

Is it possible that account change implementation is incorrect? For example, are burnt balances accounted for?

It gets the snapshot of the AccountView from the state changes and just records it into the database as is (maps amount -> affected_account_nonstaked_balance and locked -> affected_account_staked_balance)

The query essentially pulls the latest available state change about each individual account and sums up those latest balances.

bowenwang1996 commented 3 years ago

It gets the snapshot of the AccountView from the state changes and just records it into the database as is (maps amount -> affected_account_nonstaked_balance and locked -> affected_account_staked_balance)

That doesn't really work. For a given block, it is possible that there is a nonzero amount of transfer receipts. This means that the sender account has been debited but the receiver account hasn't been credited yet. If you sum up all account changes, it will be smaller than the total supply. Also how do you handle accounts that have been deleted?

frol commented 3 years ago

If you sum up all account changes, it will be smaller than the total supply.

Ok, I expected that, but as you can see, the sum is greater than the total supply. I guess, we need to [query] every single account state on a fixed block through JSON RPC and see the result, and then next step would be to investigate if something is wrong with the state changes (whether the latest in the account_changes does not match the queried account balances), and if that is fine, we will investigate the indexer implementation.

Also how do you handle accounts that have been deleted?

Their latest account_change balances are zero: https://github.com/near/near-indexer-for-explorer/blob/de02eed3ef54a6c8e9b6a6c9ebde41ad085ceb56/src/models/account_changes.rs#L62-L78

frol commented 3 years ago

What could get into the play, though, is when an account gets touched in the same block, and currently, my query will select a who-knows-which state. This could probably be the case. We need to craft the query that accurately selects the latest among the receipts in the same transaction based on the execution_outcome.index_in_chunk within the same block :thinking:

telezhnaya commented 3 years ago

The answer to the original question: no, we don't miss anything.

I took a snapshot of the system on a block 33039470 (Thu Mar 25 2021 13:52:45 GMT+0000). That block was selected because there is no changes in the blockchain 10 blocks before and after that.

By snapshot, I mean that I asked RPC for the balance of each and every account in the system and I summed up all these numbers.

Amount      613595482305167496766530766586266
Locked      407216955208179845432306793925403
Total:     1020812437513347342198837560511669
Expected:  1020812437513347342198837560511669 (total_supply)

The numbers are equal, even with broken account that was found during collecting of data. Read about it here.

I will continue the research in this ticket, you could find more details there.

telezhnaya commented 3 years ago

I understood that I checked the whole system (by RPC), but not our DB. Want to check additionally that sum is correct by collecting data only from account_changes table.

telezhnaya commented 3 years ago

Right select should look this way

SELECT
    SUM(affected_account_nonstaked_balance),
    SUM(affected_account_staked_balance)
FROM (
    SELECT DISTINCT ON (affected_account_id) affected_account_nonstaked_balance, affected_account_staked_balance
    FROM account_changes join receipts on account_changes.changed_in_block_hash = receipts.included_in_block_hash
    WHERE account_changes.changed_in_block_timestamp < 1616680365158678037
    ORDER BY affected_account_id, changed_in_block_timestamp DESC, receipts.index_in_chunk desc
) as t;

All the data is for the block 33039470 (1616680365158678037 from the query is the corresponding timestamp) We still have problems, unfortunately. Results of subselect (just not to loose the data): all_accs_from_db.txt

Liquid    613938171425008223003668951859983
Staked    407309468735858780665987257942824
Total    1021247640160867003669656209802807
Expected 1020812437513347342198837560511669

My difference is smaller, but it still exists: 435k tokens extra. I also re-checked list of accounts that were never touched, I have same results with @frol

Fortunately, it's easier to understand where do we have difference. broken.txt 2587 accounts gives different value from DB and from RPC. 2286 of 2587 problematic accounts are lockups.

@bowenwang1996 do you have ideas what could we do wrong?

bowenwang1996 commented 3 years ago

@frol how is execution_outcome.index_in_chunk computed? I remember that we have changed the order of execution outcomes in the past and I wonder whether it is possible that we made some mistakes during migration.

frol commented 3 years ago

@bowenwang1996 we collect local receipts, and included chunk receipts and enumerate that list to assign the index_in_chunk.

@telezhnaya we need to take a closer look into alex.near (the first account on the list), and see if we even have the account_changes with the correct balance, it might be that our query does not account for some other changing events.

telezhnaya commented 3 years ago

@frol while investigating this, found another problem. Case with alex.near could be fixed, we have enough data.

opgran06.near has several changes in one receipt. So index in chunk could not help us here, it's always 0. I can't see any possibility to restore the order of these actions

select * from account_changes
where account_changes.affected_account_id = 'opgran06.near' AND account_changes.changed_in_block_timestamp = 1602627776390409274
order by account_changes.changed_in_block_timestamp desc;

"affected_account_id"	"caused_by_receipt_id"	"affected_account_nonstaked_balance"	"update_reason"
"opgran06.near"	"6a3b8m4Q2mDm2fejd7EYtB2KaSP6GhCV5sBeL1KH8AWk"	40000000000000000000000000	"RECEIPT_PROCESSING"
"opgran06.near"	"6a3b8m4Q2mDm2fejd7EYtB2KaSP6GhCV5sBeL1KH8AWk"	40000018492321728300000000	"ACTION_RECEIPT_GAS_REWARD"

Case with alex.near could be fixed if we change the subselect to the following:

SELECT DISTINCT ON (affected_account_id) affected_account_id, affected_account_nonstaked_balance, affected_account_staked_balance, changed_in_block_timestamp
    FROM account_changes left join receipts on account_changes.caused_by_receipt_id = receipts.receipt_id
    WHERE account_changes.changed_in_block_timestamp < 1616680365158678037
    ORDER BY affected_account_id, changed_in_block_timestamp DESC, receipts.index_in_chunk desc;

(join -> left join, join clause is changed also) but it produces new issues like the one discussed above

frol commented 3 years ago

@telezhnaya The order of the update_reasons within a block is deterministic, so you can ORDER BY update reason converted to a number (off-topic: we need to make it clear that this is the case to our JSON RPC and Indexer users):

from Bowen:

Within a block, the order of triggers is as follows:

Rewards are credited if there is any

Transactions are processed and get converted to receipts

Receipts are processed:

First we process delayed receipts if there is any

Then we process local receipts (sender_id == receiver_id)

Then we process the rest of the receipts

To the best of my knowledge, we don't have documentation on the order anywhere, but we can recover the order from the code if we search for the state_update.commit calls like this: https://github.com/near/nearcore/blob/70e0cc147b58e7207079959c4ad74744ecbbb388/runtime/runtime/src/lib.rs#L476-L478

Let's document it to the nomicon. @bowenwang1996 @evgenykuzyakov What would be the best place to document the execution order?

telezhnaya commented 3 years ago

@frol thank you! So, actually, we can just order by account_changes.id instead of receipts.index_in_chunk and that will give us latest change in a block:

SELECT DISTINCT ON (affected_account_id) affected_account_id, affected_account_nonstaked_balance, affected_account_staked_balance, changed_in_block_timestamp
    FROM account_changes
    where account_changes.changed_in_block_timestamp < 1616680365158678037
    ORDER BY affected_account_id, changed_in_block_timestamp DESC, id desc;

We are much much closer to the truth now, but we still have some issues.

Liquid    613595421305318693067968266586266
Staked    407216955208179845432306793925403
No change accs   60999899913398562500000000 (bo.near, registrar, wallet.pulse.near)
Total    1020812437513398451898837560511669
Expected 1020812437513347342198837560511669
Diff                   51109700000000000000

We have 51109700000000000000 more than we should have.

(1) We have issues with reserve.near: https://explorer.near.org/accounts/reserve.near

RPC 2197535823693459600000000
DB  1821233534830959600000000

Let's imagine we have same numbers there. In such case, we have

Total    1020812437889700740761337560511669
Expected 1020812437513347342198837560511669
Diff               376353398562500000000000

Oops, diff becomes bigger.

(2) We also have ae384ca1c2c93a4029e34fb52a5f72adebc5ff10b8b35fdd628ab2e7c874d088 read more

RPC error (does not exist): taking 0
DB  376353398562500000000000

Let's also assume that we fix that issue

Total    1020812437513347342198837560511669
Expected 1020812437513347342198837560511669
Diff                                      0

🥳

(2) is known issue. (1) looks weird, and the account name looks like a special one. @frol @bowenwang1996 do you know something about reserve.near?

telezhnaya commented 3 years ago

I researched it a little bit more (1) and (2) are the same story, actually

reserve.near was created with 0.1 token.

Then ae384ca1c2c93a4029e34fb52a5f72adebc5ff10b8b35fdd628ab2e7c874d088 was deleted, remaining funds was given to reserve.near. reserve.near had 1.09 tokens after that.

Then there was a transfer, ae384ca1c2c93a4029e34fb52a5f72adebc5ff10b8b35fdd628ab2e7c874d088 sent some tokens to reserve.near. Deleted account sent money. OK. reserve.near had 1.8 tokens after that.

If we omit creating the keys (we do not want to count fees right now), that's all movements with reserve.near. Now it has 2.2 tokens. Where did it get additional 0.4?

bowenwang1996 commented 3 years ago

@telezhnaya The order of the update_reasons within a block is deterministic, so you can ORDER BY update reason converted to a number (off-topic: we need to make it clear that this is the case to our JSON RPC and Indexer users):

from Bowen:

Within a block, the order of triggers is as follows:

Rewards are credited if there is any

Transactions are processed and get converted to receipts

Receipts are processed:

First we process delayed receipts if there is any

Then we process local receipts (sender_id == receiver_id)

Then we process the rest of the receipts

To the best of my knowledge, we don't have documentation on the order anywhere, but we can recover the order from the code if we search for the state_update.commit calls like this: https://github.com/near/nearcore/blob/70e0cc147b58e7207079959c4ad74744ecbbb388/runtime/runtime/src/lib.rs#L476-L478

Let's document it to the nomicon. @bowenwang1996 @evgenykuzyakov What would be the best place to document the execution order?

Inside runtime spec. I created https://github.com/near/NEPs/issues/191

frol commented 3 years ago

@telezhnaya

So, actually, we can just order by account_changes.id instead of receipts.index_in_chunk and that will give us latest change in a block

Well, not really :confounded:

The id just happens to represent a correct order there for now, but it is not guaranteed, so we need to get back to the problem in a more systematic way.

We introduced id to accounts and account_changes columns to be able to paginate efficiently (using it as a cursor). To properly address the issue, we still need to define the order based on the update reason order and then on the order in chunk.

telezhnaya commented 3 years ago

@frol Returning to the context: we want to have a possibility to calculate the sum of all accounts at the provided moment of time, and not to use account_changes.id as the sorting column. In other words, we want to have a natural way of sorting the events, to get the correct snapshot of the system.

Previous solution (it works fine, but we use id)

SELECT DISTINCT ON (affected_account_id) affected_account_id, affected_account_nonstaked_balance, affected_account_staked_balance, changed_in_block_timestamp
    FROM account_changes
    where account_changes.changed_in_block_timestamp < 1616680365158678037
    ORDER BY affected_account_id, changed_in_block_timestamp DESC, id desc;

We can use update_reason somehow, but we have an issue: it can't give us 100% correct order. Read the details here https://github.com/near/NEPs/issues/191 (it's not that easy) My understanding: theoretically, there could be a situation when we are processing postponed receipts. Inside them, we have the same update_reason steps. The problem is that it's not possible to distinguish between regular receipt and postponed one. As the result, we process the regular receipt. Imagine it has 2 steps: RECEIPT_PROCESSING (1), ACTION_RECEIPT_GAS_REWARD (2). Then, in the same block, we process postponed receipt, with the only step RECEIPT_PROCESSING (3). We have 3 lines in update_reason, and we want to have only the final one, (3). We sort by update_reason and get (2). We failed. [Actually, I think it's a good idea to additionally discuss it with someone from nearcore]

Let's try to dig deeper into the real data. We have 8 change reasons:

1 'TRANSACTION_PROCESSING'
2 'ACTION_RECEIPT_PROCESSING_STARTED'
3 'ACTION_RECEIPT_GAS_REWARD'
4 'RECEIPT_PROCESSING'
5 'POSTPONED_RECEIPT'
6 'UPDATED_DELAYED_RECEIPTS'
7 'VALIDATOR_ACCOUNTS_UPDATE'
8 'MIGRATION'

Let's check which of them are really used

select distinct update_reason from account_changes;

TRANSACTION_PROCESSING
RECEIPT_PROCESSING
ACTION_RECEIPT_GAS_REWARD
VALIDATOR_ACCOUNTS_UPDATE

I changed the order of lines as I want. (*)

Oh, that simplifies the task a little. Let's suppose that life is easy, and we don't have complicated cases with postponed stuff. In this scenario, we just need to use the order that I provided (*). The problem is that in our DB, we have another order, so we have to hack the SQL a little. After shuffling it, I came into this solution:

New solution (also working, without using id)

SELECT
    SUM(affected_account_nonstaked_balance),
    SUM(affected_account_staked_balance)
FROM (
    SELECT
     DISTINCT ON (affected_account_id) affected_account_id, affected_account_nonstaked_balance, affected_account_staked_balance, changed_in_block_timestamp,
     CASE WHEN update_reason = 'TRANSACTION_PROCESSING' THEN 1
WHEN update_reason = 'RECEIPT_PROCESSING' THEN 2
WHEN update_reason = 'ACTION_RECEIPT_GAS_REWARD' THEN 3
WHEN update_reason = 'VALIDATOR_ACCOUNTS_UPDATE' THEN 4
 ELSE 0 END as aaa
    FROM account_changes left join receipts on account_changes.caused_by_receipt_id = receipts.receipt_id
    WHERE account_changes.changed_in_block_timestamp < 1616680365158678037
    ORDER BY affected_account_id, changed_in_block_timestamp DESC, aaa desc, receipts.index_in_chunk desc
) as t;

It gives 613595421305318693067968266586266 + 407216955208179845432306793925403 Perfect, exactly as in the query above https://github.com/near/near-indexer-for-explorer/issues/77#issuecomment-813368865

I don't think it's a good idea to look at the resulting query as to the working solution. The blockchain is growing, at any moment it can stop working.

I think we need to add the column with the order, or maybe to give the guarantee that id gives the correct order.

frol commented 3 years ago

@telezhnaya Reading your analysis I realized that it seems that we may just need index_in_block column (we already have index_in_chunk, index_in_transaction, etc in other tables). This will communicate the intent clearly, and we won't need to encode the order into the update reasons and rely on fact that the reasons order is fixed.

We should also consider dropping id column and use changed_in_block_hash with index_in_block as the primary key.

cc @khorolets

telezhnaya commented 3 years ago

@frol are you sure we want to have composite primary key? And natural primary key

Using natural PKs means that we depend on the data that comes to the DB, and it could give us the problems as we have now in the transactions table with duplicated tx_hash.

Using composite PKs makes it harder to create FKs in other tables, we should put there both columns.

account_changes does not sound like the most popular table, and I guess we will never meet such cases, but anyway, maybe it's better just not to use natural PKs.

The ideal solution for me is to make the process of giving the id stricter, and rename it to index_in_blockchain. But, as I understand, it's not possible. So, for now, I am thinking just about adding new column index_in_block.

frol commented 3 years ago

The ideal solution for me is to make the process of giving the id stricter, and rename it to index_in_blockchain. But, as I understand, it's not possible.

Yeah, that absolute indexing is impossible.

So, for now, I am thinking just about adding new column index_in_block.

@telezhnaya Let's add this new column and keep the id for now, so we don't need to do anything with the PK just yet.

frol commented 3 years ago

142 introduced an extra `index_in_block` column to `account_changes`, so it is easier to sort the events in a proper order.

This issue is not 100% solved by #142, but the remaining bit the balances discrepancy is discussed in #84, so I close this issue.

near / near-indexer-for-explorer

Do we miss some balance changes? #77

Previous solution (it works fine, but we use id)

New solution (also working, without using id)

142 introduced an extra `index_in_block` column to `account_changes`, so it is easier to sort the events in a proper order.

near / near-indexer-for-explorer

Do we miss some balance changes? #77

Previous solution (it works fine, but we use id)

New solution (also working, without using id)

142 introduced an extra index_in_block column to account_changes, so it is easier to sort the events in a proper order.

142 introduced an extra `index_in_block` column to `account_changes`, so it is easier to sort the events in a proper order.