lrettig commented 9 months ago

Description

If the node is unable to record the fact that it published an activation for a given epoch, e.g., by losing write access to the nipost challenge directory or database, and is then restarted, it will publish a second activation for the same epoch, thus equivocating.

Affected code

https://github.com/spacemeshos/go-spacemesh/blob/eb263c8e831d7d141e939e409d19b940be09972b/activation/activation.go#L502-L519

Proposed mitigation

The act of broadcasting the atx and recording this in local state should be atomic: it shouldn't be possible to do one without doing the other.

5207 helps somewhat, in that the state location for nipost data is the same as the rest of the state, so it's unlikely that a node would lose write access to this location and still be able to run (as is possible in v1.2 branch today where nipost data is written to the post init location), but this operation should still be made atomic.

Also, the node should already have its own previous ATX in the database and should refuse to broadcast a second ATX targeting the same epoch!

Additional information

This happened yesterday to smesher 0x3c84ca76e567ca6c20d50931de027d1c8c3c5bdc9dbf2b767ae9a67bbe7fbb7b (https://explorer.spacemesh.io/smeshers/0x3c84ca76e567ca6c20d50931de027d1c8c3c5bdc9dbf2b767ae9a67bbe7fbb7b). Here are partial logs:

{                                                                                 
  "timestamp": "2023-12-14T22:22:19.793228Z",                                     
  "help": "Node finished PoST execution using PoET challenge.",                                                                                                      
  "postComplete": {                                                               
    "challenge": "pkb5POjzjuDlisOAg2jY3sktB6yjpKMB5s3GwT7c34M="                   
  }                                                                               
}                                                                                 
{                                                                                                                                                                                                                                                                                                                                          
  "timestamp": "2023-12-14T22:22:20.954380Z",                                     
  "help": "Node committed malicious behavior. Identity will be canceled.",        
  "malfeasance": {                                                                
    "proof": {                                                                                                                                                                                                                                                                                                                             
      "smesherId": {                                                              
        "id": "PITKduVnymwg1Qkx3gJ9HIw8W9ydvyt2eumme75/u3s="                      
      },                                                                          
      "layer": {                                                                                                                                                                                                                                                                                                                           
        "number": 40320                                                           
      },                                                                          
      "kind": "MALFEASANCE_ATX",                                                  
      "debugInfo": "generate layer: 40320\nsmesher id: 3c84ca76e567ca6c20d50931de027d1c8c3c5bdc9dbf2b767ae9a67bbe7fbb7b\ncause: smesher published multiple ATXs in epoch 10\n1st message hash: 62332b4f1a7e2312600128db4bbfe9f0850ca37f62b31ee8d86b352f9bc19f28\n1st message signature: ded96ccf4dbb8041c22891eab3ae6c80bbe4e9309c4240d12b5
69dc792d3613177f682658f9629ad99149ca430ac4f0a6e0c54f33271eb909eb3014b27850c08\n2nd message hash: debf494f7b4b849fd2d1e017f291b96c833fdfa6899c6159203b164b7da3fd79\n2nd message signature: c7b6b404e2192e1f4b07c5b7946238ce8c6ee9a9fdac92dc56ba8f4bbf3557722036cb86ad36b1d74ebded351468f2c1be3e22f020f676a308f6de0449cd1a0b\n"              
    }                                                                             
  }                                                                               
}                                                                                                                                                                                                                                                                                                                                          
{                                                                                 
  "timestamp": "2023-12-14T22:25:00.331785Z",                                     
  "help": "Node published activation for the current epoch. It now needs to wait until the target epoch when it will be eligible for rewards.",
  "atxPublished": {                                                               
    "current": 10,                                                                
    "target": 11,                                                                 
    "id": "3r9JT3tLhJ/S0eAX8pG5bIM/36aJnGFZIDsWS32j/Xk=",                         
    "wait": "34499.668217s"                                                       
  }                                                                                                                                                                  
}

2023-12-04T14:40:24.373-0500    INFO    3c84c.post      post setup completed    {"node_id": "3c84ca76e567ca6c20d50931de027d1c8c3c5bdc9dbf2b767ae9a67bbe7fbb7b", "module": "post", "node_id": "3c84ca76e567ca6c20d50931de027d1c8c3c5bdc9dbf2b767ae9a67bbe7fbb7b", "commitment_atx": "813a3a65d6", "data_dir": "/Volumes/SMESHER04/post/7c8cef2b/", "num_units": "58", "labels_per_unit": "4294967296", "provider": "{0}", "name": "post"}
2023-12-04T14:40:24.385-0500    INFO    3c84c.atxBuilder        atx challenge is ready  {"node_id": "3c84ca76e567ca6c20d50931de027d1c8c3c5bdc9dbf2b767ae9a67bbe7fbb7b", "module": "atxBuilder", "sessionId": "b817220f-7cce-40ee-8e30-6a5368782344", "current_epoch": "10", "publish_epoch": "10", "target_epoch": "11", "name": "atxBuilder"}
2023-12-04T14:40:24.385-0500    INFO    3c84c.nipostBuilder     building nipost {"node_id": "3c84ca76e567ca6c20d50931de027d1c8c3c5bdc9dbf2b767ae9a67bbe7fbb7b", "module": "nipostBuilder", "sessionId": "b817220f-7cce-40ee-8e30-6a5368782344", "poet round start": "2023-11-27T03:00:00.000-0500", "poet round end": "2023-12-10T15:00:00.000-0500", "publish epoch": "10", "publish epoch end": "2023-12-15T03:00:00.000-0500", "target epoch": "11", "name": "nipostBuilder"}
2023-12-04T14:40:24.423-0500    INFO    3c84c.nipostBuilder.poet        response received       {"node_id": "3c84ca76e567ca6c20d50931de027d1c8c3c5bdc9dbf2b767ae9a67bbe7fbb7b", "module": "nipostBuilder", "url": "https://poet-111.spacemesh.network/v1/info", "status": 200}
2023-12-04T14:40:24.927-0500    INFO    3c84c.nipostBuilder.poet        response received       {"node_id": "3c84ca76e567ca6c20d50931de027d1c8c3c5bdc9dbf2b767ae9a67bbe7fbb7b", "module": "nipostBuilder", "url": "https://poet-112.spacemesh.network/v1/info", "status": 200}
2023-12-04T14:40:25.062-0500    INFO    3c84c.nipostBuilder.poet        response received       {"node_id": "3c84ca76e567ca6c20d50931de027d1c8c3c5bdc9dbf2b767ae9a67bbe7fbb7b", "module": "nipostBuilder", "url": "https://backup-poet.spacemesh.network/v1/info", "status": 200}
2023-12-04T14:40:25.096-0500    INFO    3c84c.nipostBuilder.poet        response received       {"node_id": "3c84ca76e567ca6c20d50931de027d1c8c3c5bdc9dbf2b767ae9a67bbe7fbb7b", "module": "nipostBuilder", "url": "https://mainnet-poet-0.spacemesh.network/v1/info", "status": 200}
2023-12-04T14:40:25.230-0500    INFO    3c84c.nipostBuilder.poet        response received       {"node_id": "3c84ca76e567ca6c20d50931de027d1c8c3c5bdc9dbf2b767ae9a67bbe7fbb7b", "module": "nipostBuilder", "url": "https://mainnet-poet-1.spacemesh.network/v1/info", "status": 200}
2023-12-04T14:40:26.554-0500    INFO    3c84c.nipostBuilder.poet        response received       {"node_id": "3c84ca76e567ca6c20d50931de027d1c8c3c5bdc9dbf2b767ae9a67bbe7fbb7b", "module": "nipostBuilder", "url": "https://mainnet-poet-2.spacemesh.network/v1/info", "status": 200}
2023-12-04T14:40:26.554-0500    INFO    3c84c.nipostBuilder     waiting until poet round end    {"node_id": "3c84ca76e567ca6c20d50931de027d1c8c3c5bdc9dbf2b767ae9a67bbe7fbb7b", "module": "nipostBuilder", "sessionId": "b817220f-7cce-40ee-8e30-6a5368782344", "poet_id": "f115c42343303b7b895083451653a6fee4e32429de57d16274ca579a7e791bc6", "round": "9", "wait time": "144h19m47.277659151s", "name": "nipostBuilder"}
2023-12-04T14:40:26.601-0500    INFO    3c84c.nipostBuilder.poet        response received       {"node_id": "3c84ca76e567ca6c20d50931de027d1c8c3c5bdc9dbf2b767ae9a67bbe7fbb7b", "module": "nipostBuilder", "url": "https://poet-110.spacemesh.network/v1/info", "status": 200}
2023-12-04T14:40:26.601-0500    INFO    3c84c.nipostBuilder     waiting until poet round end    {"node_id": "3c84ca76e567ca6c20d50931de027d1c8c3c5bdc9dbf2b767ae9a67bbe7fbb7b", "module": "nipostBuilder", "sessionId": "b817220f-7cce-40ee-8e30-6a5368782344", "poet_id": "e69fe63ef9aa87076f7fc534195acdabff4837f1e6663e610a414b00dd388351", "round": "9", "wait time": "144h19m47.324342562s", "name": "nipostBuilder"}
2023-12-04T14:40:26.601-0500    INFO    3c84c.nipostBuilder     waiting until poet round end    {"node_id": "3c84ca76e567ca6c20d50931de027d1c8c3c5bdc9dbf2b767ae9a67bbe7fbb7b", "module": "nipostBuilder", "sessionId": "b817220f-7cce-40ee-8e30-6a5368782344", "poet_id": "c90d96d3d2eb3a2e23a263774e3e91af9210856316138f31606f6460a739facd", "round": "9", "wait time": "144h19m43.70253096s", "name": "nipostBuilder"}
2023-12-04T14:40:26.601-0500    INFO    3c84c.nipostBuilder     waiting until poet round end    {"node_id": "3c84ca76e567ca6c20d50931de027d1c8c3c5bdc9dbf2b767ae9a67bbe7fbb7b", "module": "nipostBuilder", "sessionId": "b817220f-7cce-40ee-8e30-6a5368782344", "poet_id": "7059ea092e68111ec6397e7bea83ed6a197107fd72f7968389b74aed11d01558", "round": "9", "wait time": "144h19m47.672109722s", "name": "nipostBuilder"}
2023-12-04T14:40:26.601-0500    INFO    3c84c.nipostBuilder     waiting until poet round end    {"node_id": "3c84ca76e567ca6c20d50931de027d1c8c3c5bdc9dbf2b767ae9a67bbe7fbb7b", "module": "nipostBuilder", "sessionId": "b817220f-7cce-40ee-8e30-6a5368782344", "poet_id": "f10aa089dfb7edec98ee293e100e3b35de53ad08d7a256efd8c7608abdb8dcda", "round": "9", "wait time": "144h19m48.8264983s", "name": "nipostBuilder"}
2023-12-04T14:40:26.601-0500    INFO    3c84c.nipostBuilder     waiting until poet round end    {"node_id": "3c84ca76e567ca6c20d50931de027d1c8c3c5bdc9dbf2b767ae9a67bbe7fbb7b", "module": "nipostBuilder", "sessionId": "b817220f-7cce-40ee-8e30-6a5368782344", "poet_id": "421d5e7f1638621a180445ca3d388727f6bb9f51ac965452cb079028ede3ee6d", "round": "9", "wait time": "144h19m48.016070619s", "name": "nipostBuilder"}
2023-12-04T14:40:26.601-0500    INFO    3c84c.nipostBuilder     waiting until poet round end    {"node_id": "3c84ca76e567ca6c20d50931de027d1c8c3c5bdc9dbf2b767ae9a67bbe7fbb7b", "module": "nipostBuilder", "sessionId": "b817220f-7cce-40ee-8e30-6a5368782344", "poet_id": "71a215d189b9f4bdd1a9b5402fa52b08fc2bfb3fa5c1ed93063e7b4169c082d3", "round": "9", "wait time": "144h19m12.661856577s", "name": "nipostBuilder"}
...
2023-12-10T15:13:55.536-0500    INFO    3c84c.nipostBuilder.poet        response received       {"node_id": "3c84ca76e567ca6c20d50931de027d1c8c3c5bdc9dbf2b767ae9a67bbe7fbb7b", "module": "nipostBuilder", "url": "https://mainnet-poet-0.spacemesh.network/v1/proofs/9", "status": 429}
2023-12-10T15:13:55.537-0500    WARN    3c84c.nipostBuilder     failed to get proof from poet   {"node_id": "3c84ca76e567ca6c20d50931de027d1c8c3c5bdc9dbf2b767ae9a67bbe7fbb7b", "module": "nipostBuilder", "sessionId": "b817220f-7cce-40ee-8e30-6a5368782344", "poet_id": "7059ea092e68111ec6397e7bea83ed6a197107fd72f7968389b74aed11d01558", "round": "9", "errmsg": "getting proof: doing request: GET https://mainnet-poet-0.spacemesh.network/v1/proofs/9 giving up after 11 attempt(s)", "name": "nipostBuilder"}
2023-12-10T15:13:55.537-0500    INFO    3c84c.nipostBuilder     got poet proof  {"node_id": "3c84ca76e567ca6c20d50931de027d1c8c3c5bdc9dbf2b767ae9a67bbe7fbb7b", "module": "nipostBuilder", "leaf count": 86831894949, "name": "nipostBuilder"}
2023-12-10T15:13:55.537-0500    INFO    3c84c.nipostBuilder     got poet proof  {"node_id": "3c84ca76e567ca6c20d50931de027d1c8c3c5bdc9dbf2b767ae9a67bbe7fbb7b", "module": "nipostBuilder", "leaf count": 66025608413, "name": "nipostBuilder"}
2023-12-10T15:13:55.537-0500    INFO    3c84c.nipostBuilder     got poet proof  {"node_id": "3c84ca76e567ca6c20d50931de027d1c8c3c5bdc9dbf2b767ae9a67bbe7fbb7b", "module": "nipostBuilder", "leaf count": 45889614332, "name": "nipostBuilder"}
2023-12-10T15:13:55.537-0500    INFO    3c84c.nipostBuilder     got poet proof  {"node_id": "3c84ca76e567ca6c20d50931de027d1c8c3c5bdc9dbf2b767ae9a67bbe7fbb7b", "module": "nipostBuilder", "leaf count": 87640578262, "name": "nipostBuilder"}
2023-12-10T15:13:55.537-0500    INFO    3c84c.nipostBuilder     got poet proof  {"node_id": "3c84ca76e567ca6c20d50931de027d1c8c3c5bdc9dbf2b767ae9a67bbe7fbb7b", "module": "nipostBuilder", "leaf count": 87035821858, "name": "nipostBuilder"}
2023-12-10T15:13:55.537-0500    INFO    3c84c.nipostBuilder     got poet proof  {"node_id": "3c84ca76e567ca6c20d50931de027d1c8c3c5bdc9dbf2b767ae9a67bbe7fbb7b", "module": "nipostBuilder", "leaf count": 62206408564, "name": "nipostBuilder"}
2023-12-10T15:13:55.539-0500    INFO    3c84c.nipostBuilder     selected the best proof {"node_id": "3c84ca76e567ca6c20d50931de027d1c8c3c5bdc9dbf2b767ae9a67bbe7fbb7b", "module": "nipostBuilder", "leafCount": 87640578262, "ref": "pkb5POjzjuDlisOAg2jY3sktB6yjpKMB5s3GwT7c34M=", "name": "nipostBuilder"}
2023-12-10T15:13:55.588-0500    INFO    3c84c.nipostBuilder     starting post execution {"node_id": "3c84ca76e567ca6c20d50931de027d1c8c3c5bdc9dbf2b767ae9a67bbe7fbb7b", "module": "nipostBuilder", "challenge": "pkb5POjzjuDlisOAg2jY3sktB6yjpKMB5s3GwT7c34M=", "name": "nipostBuilder"}
2023-12-10T15:13:55.764-0500    INFO    3c84c.nipostValidator   generating proof with PoW flags: RandomXFlag(FLAG_HARD_AES | FLAG_FULL_MEM | FLAG_JIT | FLAG_SECURE) and params: ProvingParams { difficulty: 1925330167, pow_difficulty: [0, 0, 61, 181, 148, 179, 42, 102, 207, 114, 194, 52, 247, 44, 35, 79, 114, 194, 52, 247, 44, 35, 79, 114, 194, 52, 247, 44, 35, 79, 114, 194] }  {"node_id": "3c84ca76e567ca6c20d50931de027d1c8c3c5bdc9dbf2b767ae9a67bbe7fbb7b", "module": "nipostValidator", "module": "post::prove", "file": "src/prove.rs", "line": 276}
2023-12-10T15:14:37.062-0500    INFO    3c84c.nipostValidator   calculating proof of work for nonces 0..288     {"node_id": "3c84ca76e567ca6c20d50931de027d1c8c3c5bdc9dbf2b767ae9a67bbe7fbb7b", "module": "nipostValidator", "module": "post::prove", "file": "src/prove.rs", "line": 131}
...
2023-12-10T23:12:37.270-0500    INFO    3c84c.nipostValidator   Found proof for nonce: 28, pow: 63050394783202178 with [3584835803, 20665756696, 27248657076, 27534409880, 40079073984, 45214275356, 56396380240, 69285284743, 72878715397, 73217702874, 87863725151, 88172691647, 90950504439, 94073845005, 96601965708, 105964435975, 115103168779, 117782254176, 124791836128, 140205222161, 143147206777, 148599607721, 152556126989, 154892602105, 161500719603, 161833625359, 164659255626, 174585375852, 174714982086, 183899883493, 189445364277, 191179204998, 196415042421, 197826042778, 201839204732, 207151305408, 214363112060] indices      {"node_id": "3c84ca76e567ca6c20d50931de027d1c8c3c5bdc9dbf2b767ae9a67bbe7fbb7b", "module": "nipostValidator", "module": "post::prove", "file": "src/prove.rs", "line": 323}
2023-12-10T23:12:37.367-0500    INFO    3c84c.post      proving: generated proof        {"node_id": "3c84ca76e567ca6c20d50931de027d1c8c3c5bdc9dbf2b767ae9a67bbe7fbb7b", "module": "post"}
2023-12-10T23:12:37.451-0500    INFO    3c84c.nipostBuilder     finished post execution {"node_id": "3c84ca76e567ca6c20d50931de027d1c8c3c5bdc9dbf2b767ae9a67bbe7fbb7b", "module": "nipostBuilder", "duration": "7h58m41.95018025s", "name": "nipostBuilder"}
2023-12-10T23:12:37.453-0500    WARN    3c84c.nipostBuilder     cannot store nipost state       {"node_id": "3c84ca76e567ca6c20d50931de027d1c8c3c5bdc9dbf2b767ae9a67bbe7fbb7b", "module": "nipostBuilder", "errmsg": "saving builder state: writing file: create temporary file /Volumes/SMESHER04/post/7c8cef2b/nipost_builder_state.bin.tmp: open /Volumes/SMESHER04/post/7c8cef2b/nipost_builder_state.bin.tmp: operation not permitted", "name": "nipostBuilder"}
2023-12-10T23:12:37.453-0500    INFO    3c84c.nipostBuilder     finished nipost construction    {"node_id": "3c84ca76e567ca6c20d50931de027d1c8c3c5bdc9dbf2b767ae9a67bbe7fbb7b", "module": "nipostBuilder"}
2023-12-10T23:12:37.453-0500    INFO    3c84c.atxBuilder        awaiting atx publication epoch  {"node_id": "3c84ca76e567ca6c20d50931de027d1c8c3c5bdc9dbf2b767ae9a67bbe7fbb7b", "module": "atxBuilder", "pub_epoch": "10", "pub_epoch_first_layer": "40320", "current_layer": "43154", "name": "atxBuilder"}
2023-12-10T23:12:37.540-0500    INFO    3c84c.atxBuilder        atx published   {"node_id": "3c84ca76e567ca6c20d50931de027d1c8c3c5bdc9dbf2b767ae9a67bbe7fbb7b", "module": "atxBuilder", "sessionId": "b817220f-7cce-40ee-8e30-6a5368782344", "event": true, "atx_id": "62332b4f1a", "challenge": "0xc757ce06980f388a2ec8d1cee8ba6f1dbb2740ce23f069ab783f0b02d4f1c49b", "smesher": "3c84ca76e567ca6c20d50931de027d1c8c3c5bdc9dbf2b767ae9a67bbe7fbb7b", "prev_atx_id": "ea7e2ceb82", "pos_atx_id": "ea7e2ceb82", "coinbase": "sm1qqqqqqz3yyrkk4zr8wesmu04f77jxyel6pxvv9sxf8afx", "epoch": 10, "num_units": 58, "sequence_number": 4, "size": 1001, "name": "atxBuilder"}
2023-12-10T23:12:37.540-0500    WARN    3c84c.atxBuilder        failed to publish atx   {"node_id": "3c84ca76e567ca6c20d50931de027d1c8c3c5bdc9dbf2b767ae9a67bbe7fbb7b", "module": "atxBuilder", "sessionId": "b817220f-7cce-40ee-8e30-6a5368782344", "layer_id": 43154, "epoch_id": 10, "errmsg": "discarding challenge after published ATX: discarding nipost challenge: remove /Volumes/SMESHER04/post/7c8cef2b/nipost_challenge.bin: operation not permitted", "name": "atxBuilder"}
2023-12-10T23:12:37.540-0500    WARN    3c84c.atxBuilder        unknown error   {"node_id": "3c84ca76e567ca6c20d50931de027d1c8c3c5bdc9dbf2b767ae9a67bbe7fbb7b", "module": "atxBuilder", "sessionId": "b817220f-7cce-40ee-8e30-6a5368782344", "errmsg": "discarding challenge after published ATX: discarding nipost challenge: remove /Volumes/SMESHER04/post/7c8cef2b/nipost_challenge.bin: operation not permitted", "name": "atxBuilder"}
2023-12-10T23:15:00.411-0500    WARN    3c84c.atxBuilder        failed to load atx challenge    {"node_id": "3c84ca76e567ca6c20d50931de027d1c8c3c5bdc9dbf2b767ae9a67bbe7fbb7b", "module": "atxBuilder", "sessionId": "ea68a4e0-af3e-4360-b712-c965159e5e77", "errmsg": "loading nipost challenge: reading file: open file /Volumes/SMESHER04/post/7c8cef2b/nipost_challenge.bin: open /Volumes/SMESHER04/post/7c8cef2b/nipost_challenge.bin: operation not permitted", "name": "atxBuilder"}
2023-12-10T23:15:00.411-0500    INFO    3c84c.atxBuilder        building new atx challenge      {"node_id": "3c84ca76e567ca6c20d50931de027d1c8c3c5bdc9dbf2b767ae9a67bbe7fbb7b", "module": "atxBuilder", "sessionId": "ea68a4e0-af3e-4360-b712-c965159e5e77", "current_epoch": "10", "name": "atxBuilder"}
2023-12-11T02:01:06.975-0500    WARN    3c84c.atxBuilder        failed to publish atx   {"node_id": "3c84ca76e567ca6c20d50931de027d1c8c3c5bdc9dbf2b767ae9a67bbe7fbb7b", "module": "atxBuilder", "sessionId": "ea68a4e0-af3e-4360-b712-c965159e5e77", "layer_id": 43188, "epoch_id": 10, "errmsg": "saving nipost challenge: writing file: create temporary file /Volumes/SMESHER04/post/7c8cef2b/nipost_challenge.bin.tmp: open /Volumes/SMESHER04/post/7c8cef2b/nipost_challenge.bin.tmp: operation not permitted", "name": "atxBuilder"}
2023-12-11T02:01:06.975-0500    WARN    3c84c.atxBuilder        unknown error   {"node_id": "3c84ca76e567ca6c20d50931de027d1c8c3c5bdc9dbf2b767ae9a67bbe7fbb7b", "module": "atxBuilder", "sessionId": "ea68a4e0-af3e-4360-b712-c965159e5e77", "errmsg": "saving nipost challenge: writing file: create temporary file /Volumes/SMESHER04/post/7c8cef2b/nipost_challenge.bin.tmp: open /Volumes/SMESHER04/post/7c8cef2b/nipost_challenge.bin.tmp: operation not permitted", "name": "atxBuilder"}

...

2023-12-14T17:22:19.643-0500    INFO    3c84c.post      proving: generated proof        {"node_id": "3c84ca76e567ca6c20d50931de027d1c8c3c5bdc9dbf2b767ae9a67bbe7fbb7b", "module": "post"}
2023-12-14T17:22:19.793-0500    INFO    3c84c.nipostBuilder     finished post execution {"node_id": "3c84ca76e567ca6c20d50931de027d1c8c3c5bdc9dbf2b767ae9a67bbe7fbb7b", "module": "nipostBuilder", "duration": "6h55m22.843336959s", "name": "nipostBuilder"}
2023-12-14T17:22:20.842-0500    INFO    3c84c.nipostBuilder     finished nipost construction    {"node_id": "3c84ca76e567ca6c20d50931de027d1c8c3c5bdc9dbf2b767ae9a67bbe7fbb7b", "module": "nipostBuilder"}
2023-12-14T17:22:20.842-0500    INFO    3c84c.atxBuilder        awaiting atx publication epoch  {"node_id": "3c84ca76e567ca6c20d50931de027d1c8c3c5bdc9dbf2b767ae9a67bbe7fbb7b", "module": "atxBuilder", "pub_epoch": "10", "pub_epoch_first_layer": "40320", "current_layer": "44236", "name": "atxBuilder"}
2023-12-14T17:22:20.951-0500    WARN    3c84c.atxHandler        atx failed contextual validation        {"node_id": "3c84ca76e567ca6c20d50931de027d1c8c3c5bdc9dbf2b767ae9a67bbe7fbb7b", "module": "atxHandler", "requestId": "c9c46686-4860-4c3a-badc-8de4ff7e577c", "atx_id": "debf494f7b4b849fd2d1e017f291b96c833fdfa6899c6159203b164b7da
3fd79", "smesher": "3c84ca76e567ca6c20d50931de027d1c8c3c5bdc9dbf2b767ae9a67bbe7fbb7b", "errmsg": "last atx is not the one referenced", "name": "atxHandler"}
2023-12-14T17:22:20.952-0500    WARN    3c84c.atxHandler        smesher produced more than one atx in the same epoch    {"node_id": "3c84ca76e567ca6c20d50931de027d1c8c3c5bdc9dbf2b767ae9a67bbe7fbb7b", "module": "atxHandler", "requestId": "c9c46686-4860-4c3a-badc-8de4ff7e577c", "smesher": "3c84ca76e567ca6c20d50931de027d1c8c3c5bdc9d
bf2b767ae9a67bbe7fbb7b", "prev": {"atx_id": "62332b4f1a", "challenge": "0xc757ce06980f388a2ec8d1cee8ba6f1dbb2740ce23f069ab783f0b02d4f1c49b", "smesher": "3c84ca76e567ca6c20d50931de027d1c8c3c5bdc9dbf2b767ae9a67bbe7fbb7b", "prev_atx_id": "ea7e2ceb82", "pos_atx_id": "ea7e2ceb82", "coinbase": "sm1qqqqqqz3yyrkk4zr8wesmu04f77jxyel6pxvv9
sxf8afx", "epoch": 10, "num_units": 58, "effective_num_units": 58, "sequence_number": 4, "base_tick_height": 84399, "tick_count": 9392, "weight": 544736}, "curr": {"atx_id": "debf494f7b", "challenge": "0xc757ce06980f388a2ec8d1cee8ba6f1dbb2740ce23f069ab783f0b02d4f1c49b", "smesher": "3c84ca76e567ca6c20d50931de027d1c8c3c5bdc9dbf2b76
7ae9a67bbe7fbb7b", "prev_atx_id": "ea7e2ceb82", "pos_atx_id": "ea7e2ceb82", "coinbase": "sm1qqqqqqz3yyrkk4zr8wesmu04f77jxyel6pxvv9sxf8afx", "epoch": 10, "num_units": 58, "effective_num_units": 58, "sequence_number": 4, "base_tick_height": 84399, "tick_count": 9392, "weight": 544736}, "name": "atxHandler"}
2023-12-14T17:22:20.954-0500    WARN    3c84c.atxHandler        failed to process atx gossip    {"node_id": "3c84ca76e567ca6c20d50931de027d1c8c3c5bdc9dbf2b767ae9a67bbe7fbb7b", "module": "atxHandler", "requestId": "c9c46686-4860-4c3a-badc-8de4ff7e577c", "sender": "12D3KooWRByEPQFCypeTd45WN5UESWnxjKbVcF3LE6epzz4EgqkK", "errmsg": "c
annot process atx debf494f7b: cannot store atx debf494f7b: malicious atx", "name": "atxHandler"}
2023-12-14T17:22:20.954-0500    WARN    3c84c.atxBuilder        failed to publish atx   {"node_id": "3c84ca76e567ca6c20d50931de027d1c8c3c5bdc9dbf2b767ae9a67bbe7fbb7b", "module": "atxBuilder", "sessionId": "0f8cdd3c-181c-487f-83ba-e38ed5e4bbda", "layer_id": 44236, "epoch_id": 10, "errmsg": "broadcast: failed to broadcast ATX: fail
ed to publish to topic ax1: validation ignored", "name": "atxBuilder"}
2023-12-14T17:22:20.954-0500    WARN    3c84c.atxBuilder        unknown error   {"node_id": "3c84ca76e567ca6c20d50931de027d1c8c3c5bdc9dbf2b767ae9a67bbe7fbb7b", "module": "atxBuilder", "sessionId": "0f8cdd3c-181c-487f-83ba-e38ed5e4bbda", "errmsg": "broadcast: failed to broadcast ATX: failed to publish to topic ax1: validation igno
red", "name": "atxBuilder"}

...

2023-12-14T17:25:00.331-0500    INFO    3c84c.atxBuilder        atx challenge is ready  {"node_id": "3c84ca76e567ca6c20d50931de027d1c8c3c5bdc9dbf2b767ae9a67bbe7fbb7b", "module": "atxBuilder", "sessionId": "dae41e3b-c4ce-476e-924d-d32688d6cc75", "current_epoch": "10", "publish_epoch": "10", "target_epoch": "11", "name": "atxBuilde
r"}
2023-12-14T17:25:00.331-0500    INFO    3c84c.atxBuilder        atx published   {"node_id": "3c84ca76e567ca6c20d50931de027d1c8c3c5bdc9dbf2b767ae9a67bbe7fbb7b", "module": "atxBuilder", "sessionId": "dae41e3b-c4ce-476e-924d-d32688d6cc75", "event": true, "atx_id": "debf494f7b", "challenge": "0xc757ce06980f388a2ec8d1cee8ba6f1dbb2740ce23f069ab783f0b02d4f1c49b", "smesher": "3c84ca76e567ca6c20d50931de027d1c8c3c5bdc9dbf2b767ae9a67bbe7fbb7b", "prev_atx_id": "ea7e2ceb82", "pos_atx_id": "ea7e2ceb82", "coinbase": "sm1qqqqqqz3yyrkk4zr8wesmu04f77jxyel6pxvv9sxf8afx", "epoch": 10, "num_units": 58, "sequence_number": 4, "size": 1001, "name": "atxBuilder"}
2023-12-14T17:25:00.333-0500    INFO    3c84c.atxBuilder        building new atx challenge      {"node_id": "3c84ca76e567ca6c20d50931de027d1c8c3c5bdc9dbf2b767ae9a67bbe7fbb7b", "module": "atxBuilder", "sessionId": "7ed832dd-e5b9-4270-83c8-3383157ad048", "current_epoch": "10", "name": "atxBuilder"}
2023-12-14T17:25:00.333-0500    WARN    3c84c.atxBuilder        failed to publish atx   {"node_id": "3c84ca76e567ca6c20d50931de027d1c8c3c5bdc9dbf2b767ae9a67bbe7fbb7b", "module": "atxBuilder", "sessionId": "7ed832dd-e5b9-4270-83c8-3383157ad048", "layer_id": 44237, "epoch_id": 10, "errmsg": "builder: atx expired: builder cannot to submit in epoch 10. poet round already started 86h25m0.333803s ago", "name": "atxBuilder"}

This issue appears in commit hash: eb263c8e831d7d141e939e409d19b940be09972b

dshulyak commented 9 months ago

i didn't understand the problem fully. but node always executes validation handler, that saves atx to database on successful validation. and only after that it publishes atx to the network.

lrettig commented 9 months ago

My reading of the code above (Builder.Run/Builder.PublishActivationTx) is that it builds the NIPost challenge, then creates the ATX, then publishes it, then removes the NIPost challenge. If the last step fails, it tries to publish again one layer later, regardless of whether publishing succeeded or failed. I don't see a validation handler in this loop, LMK what I'm missing.

lrettig commented 9 months ago

For some reason this node succeeded in publishing its ATX the first time (despite the misleading log messages), but didn't manage to record its own ATX in its own database the first time. Is there any way the publish -> record in database loop could've broken? I can't find any other info in the logs.

In any case, given the severity of this sort of error, I still think that the operation of publishing one's own ATX should be atomic with recording it in the database (and, additionally, that a node should always check whether an ATX could be considered malicious before publishing it).

fasmat commented 9 months ago

The logs are a bit misleading. The node will ALWAYS validate an ATX it publishes before it does so. broadcast has to go through the same validators as an incoming ATX has to go through, so only if those pass (and that includes a check that no other ATX for the same identity in the same epoch exists already) the ATX will actually be broadcasted.

The problem here is that the device the node tries to write its state to is read only. This also means the node you have posted the logs from is restarted it will have to skip a full epoch of rewards because it doesn't know that it already submitted a challenge to a PoET.

We are right now in the process of refactoring the relevant code in the activation package. Part of that refactoring has already happened in https://github.com/spacemeshos/go-spacemesh/pull/5207. More is currently under review in https://github.com/spacemeshos/go-spacemesh/pull/5219 - planned to be merged with a group of other changes after 1.3.0 is released and part of the release after 1.3.0.

Additionally I'm planning on rewriting the relevant code in the activation package into a finite state machine which will ensure that a) state transitions are happening atomically at clearly defined events and b) current state of the ATX building process can be more clearly communicated to the user via the state the machine is in and events that can be emitted during state transitions.

The quick fix for this issue is to allow writes to /Volumes/SMESHER04/post/7c8cef2b

lrettig commented 9 months ago

The quick fix for this issue is to allow writes to /Volumes/SMESHER04/post/7c8cef2b

Doing this (plus a restart) caused the node to publish a second ATX and equivocate ;)

lrettig commented 9 months ago

@fasmat I still don't understand how the node could have managed to publish its own ATX without recording that fact in the database. The state DB remained on a device that was writable, and everything else worked fine.

fasmat commented 9 months ago

Here is what I think might have happened:

2023-12-14T17:22:19.643-0500    INFO    3c84c.post      proving: generated proof        {"node_id": "3c84ca76e567ca6c20d50931de027d1c8c3c5bdc9dbf2b767ae9a67bbe7fbb7b", "module": "post"}
2023-12-14T17:22:19.793-0500    INFO    3c84c.nipostBuilder     finished post execution {"node_id": "3c84ca76e567ca6c20d50931de027d1c8c3c5bdc9dbf2b767ae9a67bbe7fbb7b", "module": "nipostBuilder", "duration": "6h55m22.843336959s", "name": "nipostBuilder"}
2023-12-14T17:22:20.842-0500    INFO    3c84c.nipostBuilder     finished nipost construction    {"node_id": "3c84ca76e567ca6c20d50931de027d1c8c3c5bdc9dbf2b767ae9a67bbe7fbb7b", "module": "nipostBuilder"}
2023-12-14T17:22:20.842-0500    INFO    3c84c.atxBuilder        awaiting atx publication epoch  {"node_id": "3c84ca76e567ca6c20d50931de027d1c8c3c5bdc9dbf2b767ae9a67bbe7fbb7b", "module": "atxBuilder", "pub_epoch": "10", "pub_epoch_first_layer": "40320", "current_layer": "44236", "name": "atxBuilder"}
2023-12-14T17:22:20.951-0500    WARN    3c84c.atxHandler        atx failed contextual validation        {"node_id": "3c84ca76e567ca6c20d50931de027d1c8c3c5bdc9dbf2b767ae9a67bbe7fbb7b", "module": "atxHandler", "requestId": "c9c46686-4860-4c3a-badc-8de4ff7e577c", "atx_id": "debf494f7b4b849fd2d1e017f291b96c833fdfa6899c6159203b164b7da3fd79", "smesher": "3c84ca76e567ca6c20d50931de027d1c8c3c5bdc9dbf2b767ae9a67bbe7fbb7b", "errmsg": "last atx is not the one referenced", "name": "atxHandler"}
2023-12-14T17:22:20.952-0500    WARN    3c84c.atxHandler        smesher produced more than one atx in the same epoch    {"node_id": "3c84ca76e567ca6c20d50931de027d1c8c3c5bdc9dbf2b767ae9a67bbe7fbb7b", "module": "atxHandler", "requestId": "c9c46686-4860-4c3a-badc-8de4ff7e577c", "smesher": "3c84ca76e567ca6c20d50931de027d1c8c3c5bdc9dbf2b767ae9a67bbe7fbb7b", "prev": {"atx_id": "62332b4f1a", "challenge": "0xc757ce06980f388a2ec8d1cee8ba6f1dbb2740ce23f069ab783f0b02d4f1c49b", "smesher": "3c84ca76e567ca6c20d50931de027d1c8c3c5bdc9dbf2b767ae9a67bbe7fbb7b", "prev_atx_id": "ea7e2ceb82", "pos_atx_id": "ea7e2ceb82", "coinbase": "sm1qqqqqqz3yyrkk4zr8wesmu04f77jxyel6pxvv9sxf8afx", "epoch": 10, "num_units": 58, "effective_num_units": 58, "sequence_number": 4, "base_tick_height": 84399, "tick_count": 9392, "weight": 544736}, "curr": {"atx_id": "debf494f7b", "challenge": "0xc757ce06980f388a2ec8d1cee8ba6f1dbb2740ce23f069ab783f0b02d4f1c49b", "smesher": "3c84ca76e567ca6c20d50931de027d1c8c3c5bdc9dbf2b767ae9a67bbe7fbb7b", "prev_atx_id": "ea7e2ceb82", "pos_atx_id": "ea7e2ceb82", "coinbase": "sm1qqqqqqz3yyrkk4zr8wesmu04f77jxyel6pxvv9sxf8afx", "epoch": 10, "num_units": 58, "effective_num_units": 58, "sequence_number": 4, "base_tick_height": 84399, "tick_count": 9392, "weight": 544736}, "name": "atxHandler"}
2023-12-14T17:22:20.954-0500    WARN    3c84c.atxHandler        failed to process atx gossip    {"node_id": "3c84ca76e567ca6c20d50931de027d1c8c3c5bdc9dbf2b767ae9a67bbe7fbb7b", "module": "atxHandler", "requestId": "c9c46686-4860-4c3a-badc-8de4ff7e577c", "sender": "12D3KooWRByEPQFCypeTd45WN5UESWnxjKbVcF3LE6epzz4EgqkK", "errmsg": "cannot process atx debf494f7b: cannot store atx debf494f7b: malicious atx", "name": "atxHandler"}
2023-12-14T17:22:20.954-0500    WARN    3c84c.atxBuilder        failed to publish atx   {"node_id": "3c84ca76e567ca6c20d50931de027d1c8c3c5bdc9dbf2b767ae9a67bbe7fbb7b", "module": "atxBuilder", "sessionId": "0f8cdd3c-181c-487f-83ba-e38ed5e4bbda", "layer_id": 44236, "epoch_id": 10, "errmsg": "broadcast: failed to broadcast ATX: failed to publish to topic ax1: validation ignored", "name": "atxBuilder"}
2023-12-14T17:22:20.954-0500    WARN    3c84c.atxBuilder        unknown error   {"node_id": "3c84ca76e567ca6c20d50931de027d1c8c3c5bdc9dbf2b767ae9a67bbe7fbb7b", "module": "atxBuilder", "sessionId": "0f8cdd3c-181c-487f-83ba-e38ed5e4bbda", "errmsg": "broadcast: failed to broadcast ATX: failed to publish to topic ax1: validation ignored", "name": "atxBuilder"}

Node detected correctly that it already had published an ATX and did not publish it again.

EDIT: the next try to publish according to the logs failed again (probably due to a restart) because the node as predicted forgot that it already submitted to an PoET and is now waiting for the next cycle gap.

fasmat commented 9 months ago

@fasmat I still don't understand how the node could have managed to publish its own ATX without recording that fact in the database. The state DB remained on a device that was writable, and everything else worked fine.

It did. It even complained that it won't publish another ATX because there is already one in the DB.

lrettig commented 9 months ago

Had a chat with @fasmat. We agreed:

We should make sure a node never generates and publishes a malfeasance proof against itself (which happened here)
@fasmat is working on rewriting all of this logic as a finite state machine, as mentioned above. We'll make sure that the state transitions of this machine are atomic to prevent this sort of issue in future.

spacemeshos / go-spacemesh

Equivocation without violating any rules #5363

Description

Affected code

Proposed mitigation

Additional information