spacemeshos / go-spacemesh

Go Implementation of the Spacemesh protocol full node. 💾⏰💪
https://spacemesh.io
MIT License
761 stars 212 forks source link

Report of Critical Bug in Spacemesh v1.2.8 - Inability for Nodes Syncing via Local Network to Receive Rewards #5364

Open bugxzhu opened 11 months ago

bugxzhu commented 11 months ago

Dear Spacemesh Team,

I am writing to report a critical bug I encountered while using version v1.2.8 of Spacemesh. It pertains to nodes syncing via a local network, specifically the failure to correctly receive rewards, despite the rewards being logged in the system. However, these rewards are not reflected on the chain or within the account balance.

Bug Description: While running a node that syncs via a local network, the system's logs indicate that rewards have been obtained. Regrettably, these rewards do not display accurately on the chain, nor do they reflect in the account balance. Although the rewards acquisition is logged, there is no corresponding record or reward visible on-chain or in the account balance.

Bug Manifestation:

The local network syncing node correctly logs reward acquisition. However, no corresponding reward information is displayed on the chain. My account balance fails to reflect any rewards received. Steps to Reproduce the Bug:

Run Spacemesh v1.2.8. Opt for node synchronization via a local network. After a period of operation, the logs display rewards acquired. Despite the log entry, there is no visible record of the rewards on the chain or in the account balance. Additional Information:

I attempted multiple syncs and node restarts, but the issue persists. I have ensured that my node settings are accurate and no irregular actions were taken. Expected Resolution: I kindly request your team to investigate and resolve this bug. Accurate recording of rewards is crucial for users syncing via a local network. I am willing to provide further information or assist in any tests necessary to resolve this issue.

Please let me know if there is any additional information required or if further testing is necessary to pinpoint and resolve this bug. Your team's assistance and support are greatly appreciated.

Thank you for taking the time to review this email, and I look forward to your response.

Sincerely,

node logs: 2023-12-15T08:12:18.793Z INFO c6d00.proposalBuilder proposal eligibilities for an epoch {"node_id": "c6d0075ddb273968baa24f686668ef5d47136ae1a653bdedad2b37b5db9049c4", "module": "proposalBuilder", "epoch": 11, "beacon": "0xa12acad4", "atx": "4ce5ad8713", "weight": 298464, "prev": 0, "slots": 2, "eligible": 2, "eligible by layer": [{"layer": 44633, "slots": 1}, {"layer": 46845, "slots": 1}], "name": "proposalBuilder"} 2023-12-16T07:25:04.139Z INFO c6d00.conState received mempool txs {"node_id": "c6d0075ddb273968baa24f686668ef5d47136ae1a653bdedad2b37b5db9049c4", "module": "conState", "layer_id": 44633, "num_accounts": 0, "name": "conState"} 2023-12-16T07:25:05.781Z INFO c6d00.proposalBuilder proposal created {"node_id": "c6d0075ddb273968baa24f686668ef5d47136ae1a653bdedad2b37b5db9049c4", "module": "proposalBuilder", "proposal_id": "9030c9dbb2", "transactions": 0, "mesh_hash": "1b1fe7723e", "ballot_id": "2d9bc5eeed", "layer_id": 44633, "epoch_id": 11, "smesher": "c6d0075ddb273968baa24f686668ef5d47136ae1a653bdedad2b37b5db9049c4", "opinion hash": "1b1fe7723e", "base_ballot": "ab28a7b2b6", "support": 1, "against": 0, "abstain": 0, "atx_id": "4ce5ad8713", "ref_ballot": "0000000000", "active set hash": "0848703629", "beacon": "a12acad4", "votes": {"base": "ab28a7b2b6", "support": [{"id": "e4092bb19c", "layer": 44632, "height": 84399}], "against": [], "abstain": []}, "latency": {"data": "3.245017909s", "tortoise": "41.694309ms", "txs": "5.18565ms", "hash": "30.361797ms", "publish": "1.611940677s", "total": "4.934200342s"}, "name": "proposalBuilder"} 2023-12-16T07:26:50.717Z INFO c6d00.executor optimistically executed block {"node_id": "c6d0075ddb273968baa24f686668ef5d47136ae1a653bdedad2b37b5db9049c4", "module": "executor", "lid": 44633, "block": "2e49b371c7", "state_hash": "0xa7419a1f23c5dd0f75543499aec63f5a19ffa05cfefc2222406490fccdb6eb87", "duration": "1.008054682s", "count": 2, "skipped": 0, "rewards": 44, "name": "executor"} 2023-12-16T07:26:50.847Z INFO c6d00.blockGenerator generated block (optimistic) {"node_id": "c6d0075ddb273968baa24f686668ef5d47136ae1a653bdedad2b37b5db9049c4", "module": "blockGenerator", "layer_id": 44633, "block_id": "2e49b371c7", "name": "blockGenerator"} 2023-12-16T07:26:57.004Z INFO c6d00.blockCert generating certificate {"node_id": "c6d0075ddb273968baa24f686668ef5d47136ae1a653bdedad2b37b5db9049c4", "module": "blockCert", "requestId": "7cdf3c74-e1e8-44fa-a767-5c047d6f7e1d", "layer_id": 44633, "block_id": "2e49b371c7", "eligibility_count": 101, "num_msg": 101, "name": "blockCert"}

spacemesh log: { "timestamp": "2023-12-16T07:25:05.782055424Z", "help": "Node published proposal. Rewards will be received once proposal is included in the block.", "proposal": { "layer": 44633, "proposal": "kDDJ27KMkvFJ+QxNtZpr781SdbE=" } }

edisonsmith commented 11 months ago

me too

pigmej commented 11 months ago

Could you please check with a more recent version of Go-spacemesh? There were multiple fixes for the network especially to accommodate the growth.

In the meantime what is your peer count on the Public nodes?

We had in the past situations where the public/private was misconfigured could you share please anonymised configs? When you replace node ids please replace the same IDs with the same string and when the node id is different please use a different string.

What is most likely happening, the network is not receiving your proposals in time.

bugxzhu commented 11 months ago

@pigmej

Dear Spacemesh Team,

Thank you for your prompt response. I've gathered feedback from other users experiencing the same issue, indicating that this is not an isolated incident. This problem has persisted in previous versions and continues in version 1.2.10. Additionally, I am not the sole user encountering this issue. My public node connection count is 24, and the configuration for one of my three public nodes across different IPs in the local network is as follows:

Public Node Configuration: "p2p": { "disable-reuseport": false, "p2p-disable-legacy-discovery": true, "autoscale-peers": true, "bootnodes": [ // bootnode list here ], "direct": [ // direct connections list here "/ip4/0.0.0.0/tcp/6002/p2p/12D3KooWGRy3zzi3L78FXrPDGuMfhYJE4WCiiRz7LHdQLNdJaHHN", "/ip4/0.0.0.0/tcp/6001/p2p/12D3KooWGebdETd67qCV82iT1Cop5ZVG3EfyPa8m9mFbnY3a6qbv" ............... ] }

Private Node Configuration: "p2p": { "listen": "/ip4/0.0.0.0/tcp/6004", "disable-reuseport": false, "p2p-disable-legacy-discovery": true, "autoscale-peers": true, "disable-dht": true, "direct": [ "/ip4/192.168.1.23/tcp/7510/p2p/12D3KooWSqyS5UMQJTdNzsMn6iFBi86vgmuZsyF7u3V4C2AhcV66", "/ip4/192.168.1.23/tcp/7511/p2p/12D3KooWJdTW12gcyL3jbpH1gd65rQJSADpDELKYsK4AKM8AFH9p", "/ip4/127.0.0.1/tcp/7510/p2p/12D3KooWNBBcJNzMFjtkXKBmekvqy9izBZduy7E8tqnXnc1om9hZ" ], "bootnodes": [], "min-peers": 3, "low-peers": 10, "high-peers": 20, "inbound-fraction": 1, "outbound-fraction": 0.5 } I can confirm that each Node ID in the configurations provided is unique to avoid any conflicts within the network.

I believe these details regarding node configurations might be helpful in the investigation. If there's any additional information or specific tests required from my end to assist in resolving this issue, please let me know. I remain enthusiastic about contributing to the resolution process and eagerly await further guidance from your team.

Thank you for your continued attention and support.

bugxzhu commented 11 months ago

To further assist in troubleshooting this issue, I would like to inquire whether it might be necessary to map certain ports to the public network. If configuring port mappings would aid in resolving network connectivity issues or assist in rectifying this bug, please advise accordingly. Your guidance on this matter would be greatly appreciated.

hieixu commented 11 months ago

me too. version: go-spacemesh 1.2.11

2023-12-17T21:10:03.187+0800 INFO 61fa9.conState received mempool txs {"node_id": "61fa9220c8a00af8627f2ef04843e8f0bdda6766db35b696c58ffb869e1fa572", "module": "conState", "layer_id": 44990, "num_accounts": 0, "name": "conState"} 2023-12-17T21:10:03.358+0800 INFO 61fa9.proposalBuilder proposal created {"node_id": "61fa9220c8a00af8627f2ef04843e8f0bdda6766db35b696c58ffb869e1fa572", "module": "proposalBuilder", "proposal_id": "6becc5b241", "transactions": 0, "mesh_hash": "bb36cf4892", "ballot_id": "e3e818b1e8", "layer_id": 44990, "epoch_id": 11, "smesher": "61fa9220c8a00af8627f2ef04843e8f0bdda6766db35b696c58ffb869e1fa572", "opinion hash": "bb36cf4892", "base_ballot": "ef07cf5c0e", "support": 1, "against": 0, "abstain": 0, "atx_id": "fb3fc2cafe", "ref_ballot": "0000000000", "active set hash": "a8ca8d3a1c", "beacon": "a12acad4", "votes": {"base": "ef07cf5c0e", "support": [{"id": "26fec26137", "layer": 44989, "height": 84399}], "against": [], "abstain": []}, "latency": {"data": "3.039737309s", "tortoise": "21.158127ms", "txs": "251.167µs", "hash": "666.997µs", "publish": "170.306102ms", "total": "3.232119702s"}, "name": "proposalBuilder"} 2023-12-17T21:11:49.023+0800 INFO 61fa9.executor optimistically executed block {"node_id": "61fa9220c8a00af8627f2ef04843e8f0bdda6766db35b696c58ffb869e1fa572", "module": "executor", "lid": 44990, "block": "5cf5eb5e03", "state_hash": "0xc3ffa88bc7811bad8bee35dd3d3ba70a7c734a8e1ee41dba34f41a4f67990fd0", "duration": "11.73444ms", "count": 2, "skipped": 0, "rewards": 47, "name": "executor"} 2023-12-17T21:11:49.033+0800 INFO 61fa9.blockGenerator generated block (optimistic) {"node_id": "61fa9220c8a00af8627f2ef04843e8f0bdda6766db35b696c58ffb869e1fa572", "module": "blockGenerator", "layer_id": 44990, "block_id": "5cf5eb5e03", "name": "blockGenerator"} 2023-12-17T21:11:57.750+0800 INFO 61fa9.blockCert generating certificate {"node_id": "61fa9220c8a00af8627f2ef04843e8f0bdda6766db35b696c58ffb869e1fa572", "module": "blockCert", "requestId": "1f6dc18b-a0d0-41fd-995b-b2c1f1c5a189", "layer_id": 44990, "block_id": "5cf5eb5e03", "eligibility_count": 101, "num_msg": 101, "name": "blockCert"} 2023-12-17T21:16:49.125+0800 INFO 61fa9.executor optimistically executed block {"node_id": "61fa9220c8a00af8627f2ef04843e8f0bdda6766db35b696c58ffb869e1fa572", "module": "executor", "lid": 44991, "block": "1778aa64b9", "state_hash": "0xd2206da39c478690f906fd515e246602d0fea834f72a6bb9474535185bba3a64", "duration": "116.362772ms", "count": 3, "skipped": 0, "rewards": 53, "name": "executor"} 2023-12-17T21:16:49.158+0800 INFO 61fa9.blockGenerator generated block (optimistic) {"node_id": "61fa9220c8a00af8627f2ef04843e8f0bdda6766db35b696c58ffb869e1fa572", "module": "blockGenerator", "layer_id": 44991, "block_id": "1778aa64b9", "name": "blockGenerator"}

bugxzhu commented 11 months ago

Dear Spacemesh Team,

I hope this message finds you well. I am writing to provide feedback regarding a discrepancy observed in the port connections of my node, which seems to differ from the information outlined in the official documentation.

As per the official documentation, I expected certain port connections not to be established. However, upon checking, I noticed a consistent communication link on port 7513 of the node, indicating mutual communication on this port. Surprisingly, this communication appears to be successful, unlike what the documentation suggested. Here is the output indicating the unexpected port connections: ss -npO4 | rg spacemesh | rg 6000

tcp ESTAB 0 0 127.0.0.1:7513 127.0.0.1:6000 users:(("go-spacemesh",pid=39165,fd=11)) tcp ESTAB 0 0 127.0.0.1:6000 127.0.0.1:7513 users:(("go-spacemesh",pid=39202,fd=47))

This interaction on port 7513 seems to be facilitating successful synchronization between nodes without any apparent issues.

I am reaching out to bring this discrepancy to your attention. Should this observed connectivity on port 7513 be expected behavior, or does it signify a deviation from the standard setup outlined in the documentation? I would greatly appreciate any clarification or guidance on this matter.

If any additional information, tests, or adjustments to configurations are required from my end, please let me know. Your prompt assistance and insights into this matter would be immensely helpful.

Thank you for your attention and support in addressing this discrepancy.

bugxzhu commented 11 months ago

WechatIMG294 @pigmej
I hope this message finds you well. I am reaching out regarding a warning message received by another user in the community. The warning message, which appears to be related to the miner not being active in a specific epoch, has raised concerns about potential block certification issues:

"errmsg": "oracle: miner is not active in epoch: 37b6a4bac25cfebef877e7fc06f9e251a25b3c81af084a6601700602dd617498", "name": "blockCert"}

I am curious to understand if this warning message is indicative of a broader issue within the Spacemesh network that might affect block certification. Additionally, could this potentially be related to the warning messages some users, including myself, have experienced regarding rewards not reflecting accurately?

I am keen to assist in providing any necessary information or conducting tests to help resolve these issues. Your insights and guidance on how to address this warning and its potential impact on block certification would be greatly appreciated.

bugxzhu commented 11 months ago

After my initial inquiry, I've discovered that the warning message indicating "miner is not active in epoch" is not exclusive to a particular user. Interestingly, the warning is present on my public node as well, despite the node successfully performing block mining operations.

pigmej commented 11 months ago

@bugxzhu from the config that you shared I assume that your public node listens on 7513 for p2p traffic, as it's not specified in the config. Or you omitted that config?

that log is about different node. The node_id mentioned there is a logger node, not the node about which it's reporting the situation.

bugxzhu commented 11 months ago

@pigmej

here is reward log from another user.He has the same problem 2023-12-18T00:30:08.124Z INFO 998c4.conState received mempool txs {"node_id": "998c49e8a4a5d41d0054debfbe613234fa6250c74b9ced2824876ec4bf7dc8f3", "module": "conState", "layer_id": 45126, "num_accounts": 0, "name": "conState"} 2023-12-18T00:30:08.365Z INFO 998c4.proposalBuilder proposal created {"node_id": "998c49e8a4a5d41d0054debfbe613234fa6250c74b9ced2824876ec4bf7dc8f3", "module": "proposalBuilder", "proposal_id": "0cce61b82b", "transactions": 0, "mesh_hash": "9cc12b4c9f", "ballot_id": "354c14d6b9", "layer_id": 45126, "epoch_id": 11, "smesher": "998c49e8a4a5d41d0054debfbe613234fa6250c74b9ced2824876ec4bf7dc8f3", "opinion hash": "9cc12b4c9f", "base_ballot": "079a56e1e1", "support": 1, "against": 0, "abstain": 0, "atx_id": "d7d39243f7", "ref_ballot": "0000000000", "active set hash": "5b653449a6", "beacon": "a12acad4", "votes": {"base": "079a56e1e1", "support": [{"id": "6400c6d302", "layer": 45125, "height": 84445}], "against": [], "abstain": []}, "latency": {"data": "7.121110339s", "tortoise": "36.709681ms", "txs": "187.457µs", "hash": "1.049451ms", "publish": "239.208542ms", "total": "7.39826547s"}, "name": "proposalBuilder"} 2023-12-18T00:31:49.015Z INFO 998c4.blockGenerator no consensus on mesh hash. NOT doing optimistic filtering {"node_id": "998c49e8a4a5d41d0054debfbe613234fa6250c74b9ced2824876ec4bf7dc8f3", "module": "blockGenerator", "layer_id": 45126, "name": "blockGenerator"} 2023-12-18T00:31:49.016Z INFO 998c4.blockGenerator generated block {"node_id": "998c49e8a4a5d41d0054debfbe613234fa6250c74b9ced2824876ec4bf7dc8f3", "module": "blockGenerator", "layer_id": 45126, "block_id": "43154f95d4", "name": "blockGenerator"} 2023-12-18T00:31:49.076Z INFO 998c4.executor executed block {"node_id": "998c49e8a4a5d41d0054debfbe613234fa6250c74b9ced2824876ec4bf7dc8f3", "module": "executor", "lid": 45126, "block": "43154f95d4", "state_hash": "0x2756544a0e3f47ce7c26f32754ba8efff1da6faf2c25f844db92816fc83e2087", "duration": "23.990588ms", "count

{ "timestamp": "2023-12-18T00:30:08.365081358Z", "help": "Node published proposal. Rewards will be received once proposal is included in the block.", "proposal": { "layer": 45126, "proposal": "DM5huCuF5jFVBGHDrPfVx/qsNhU="

config "p2p": { "disable-reuseport": false, "p2p-disable-legacy-discovery": true, "autoscale-peers": true, "bootnodes": [], "direct": [ "/ip4/192.168.15.11/tcp/7513/p2p/12D3KooWATQ82He4jvCfJkBZM1", "/ip4/192.168.15.12/tcp/7513/p2p/12D3KooWB9MK7c3RG3b73q1Gm", "/ip4/192.168.15.15/tcp/7513/p2p/12D3KooWPQxLueoARSqvw4Th7V", "/ip4/192.168.15.22/tcp/7513/p2p/12D3KooWBqkUMBUfEEX5YNKcVL" ], "disable-dht": true, "min-peers": 5, "low-peers": 10, "high-peers": 20, "inbound-fraction": 1, "outbound-fraction": 0.5 }, "logging": { "p2p": "error" }, "cache": { "atx-size": 500000

pigmej commented 11 months ago

Two things.

Increase the amount of the peers, 10 peers as low-peers is way too low for any reliable propagation

Second item, as you noticed we're experiencing huge growth. We have created https://github.com/spacemeshos/go-spacemesh/issues/5366 to track the possible improvements.

pigmej commented 11 months ago

Third item @bugxzhu please make sure that the configs are correct, because 100% the ones that you shared can't work reliably (wrong ports at best).

bugxzhu commented 11 months ago

@pigmej Thank you for your prompt response and the valuable information provided previously.

Regarding the discussion on block certification warnings and their potential impact on reward distribution, I am curious to explore whether the internal nodes require additional connections to public nodes to prevent reward loss. Does the warning message about miner inactivity in specific epochs imply that a mere 10 connections from internal nodes to public nodes may not be sufficient to ensure the accurate distribution of rewards?

Would increasing the number of connections from internal nodes to public nodes beyond the existing count of 10 potentially address or mitigate the risk of reward loss? Understanding the correlation between the number of connections and the accurate reflection of rewards in the network is crucial for optimizing the internal node configuration.

pigmej commented 11 months ago

In short the more connections you have the less likely you're going to miss rewards, because the most likely network will get it in time.

However the network is a bit overwhelmed currently so we're in less than ideal situation.

HVNC1 commented 11 months ago

i have minimum connections of 30 and still losing rewards it doesn't solve the problem 1.2.11 go-sm

"timestamp": "2023-12-18T16:15:02.376220900Z", "help": "Node published proposal. Rewards will be received once proposal is included in the block.", "proposal": { "layer": 45315, "proposal": "aF2AmaHIE/WnPgfToUyRFa+NGMY="

output.txt info.txt image

"min-peers": 30,
"low-peers": 35,
"high-peers": 38,