input-output-hk / mithril

Stake-based threshold multi-signatures protocol
https://mithril.network
Apache License 2.0
129 stars 41 forks source link

V2445.0 w/ 10.1.2 - Reading from EraReader adapter raised an error: 'general error'. #2104

Closed mrabdibdi-anvil closed 14 hours ago

mrabdibdi-anvil commented 2 days ago

Context & versions

Using Docker with prebuilt binaries

mithril-signer_0.2.209+67dc6e4-1_amd64.deb

Steps to reproduce

I restarted using the newest Binary version, and it shows the message (see below)

I tried to remove the /opt/mithril/stores, but without success.

Actual behavior

Upgraded to latest version (2445.0) And the cardano node to 10.1.2

This message was after I deleted the stores directory.

I get this message:

{"msg":"New cycle: Init","v":0,"name":"mithril-signer","level":30,"time":"2024-11-11T21:16:24.286595672Z","hostname":"de2392c11384","pid":7,"src":"StateMachine"}
{"msg":"Incrementing 'mithril_signer_runtime_cycle_total_since_startup' counter","v":0,"name":"mithril-signer","level":20,"time":"2024-11-11T21:16:24.286602777Z","hostname":"de2392c11384","pid":7,"src":"MetricsService"}
{"msg":">> get_current_time_point","v":0,"name":"mithril-signer","level":20,"time":"2024-11-11T21:16:24.286612745Z","hostname":"de2392c11384","pid":7,"src":"SignerRunner"}
{"msg":"Starting HTTP server for metrics on port 9292","v":0,"name":"mithril-signer","level":30,"time":"2024-11-11T21:16:24.286624233Z","hostname":"de2392c11384","pid":7,"src":"MetricsServer"}
{"msg":"Retrieve aggregator features message","v":0,"name":"mithril-signer","level":20,"time":"2024-11-11T21:16:24.287746643Z","hostname":"de2392c11384","pid":7,"src":"AggregatorHTTPClient"}
{"msg":">> update_era_checker(epoch:521)","v":0,"name":"mithril-signer","level":20,"time":"2024-11-11T21:16:24.31401545Z","hostname":"de2392c11384","pid":7,"src":"SignerRunner"}
{"msg":"A critical error occurred, aborting runtime. message = 'Could not update Era checker with context 'init → unregistered' for epoch Epoch(521)'","v":0,"name":"mithril-signer","level":60,"time":"2024-11-11T21:16:24.402925379Z","hostname":"de2392c11384","pid":7,"src":"StateMachine","nested_error":"Adapter Error message: «Reading from EraReader adapter raised an error: 'general error'.»\n\nCaused by:\n    0: general error\n    1: PallasChainObserver failed to get utxo\n    2: error while sending or receiving data through the channel"}
{"msg":"shutting down HTTP server after receiving signal","v":0,"name":"mithril-signer","level":40,"time":"2024-11-11T21:16:24.402954506Z","hostname":"de2392c11384","pid":7,"src":"MetricsServer"}
Error: A critical error occurred, aborting runtime. message = 'Could not update Era checker with context 'init → unregistered' for epoch Epoch(521)'

Caused by:
    0: Adapter Error message: «Reading from EraReader adapter raised an error: 'general error'.»
    1: general error
    2: PallasChainObserver failed to get utxo
    3: error while sending or receiving data through the channel

Expected behavior

Trying to understand the provenance of this error, is it a me issue ? I suspect file permissions with the volumes (sharing cardano node.socket, db and keys)

Thank you !

jpraynaud commented 1 day ago

Thanks @mrabdibdi-anvil for creating this issue.

The error message states that the connection to the Cardano node done by the Mithril signer did not work. Usually when this sort of problem happens restarting the Cardano node is enough to fix.

Can you tell us what happens when:

mrabdibdi-anvil commented 1 day ago

Morning,

you restart the signer node. Does the problem persist?

Yes.

you connect to the Cardano node with the Cardano CLI (e.g. to retrieve the tip of the chain)?

Yes it works. From the mithril-signer container:

/usr/local/bin/cardano-cli query tip --socket-path /cardano/ipc/node.socket --mainnet

{
    "block": 11084811,
    "epoch": 521,
    "era": "Conway",
    "hash": "28302bd95637cd4d792711a5a8f35ad679b00975f97d531fe23ac0886945bd7c",
    "slot": 139858372,
    "slotInEpoch": 149572,
    "slotsToEpochEnd": 282428,
    "syncProgress": "100.00"
}

Cardano related env vars:

CARDANO_NODE_SOCKET_PATH=/cardano/ipc/node.socket
CARDANO_CLI_PATH=/usr/local/bin/cardano-cli

I tried with a clean stores/ directory, and I see this message upon restart:

{"msg":"🔥 Cardano transactions preloader failed","v":0,"name":"mithril-signer","level":60,"time":"2024-11-12T15:21:50.55431616Z","hostname":"bb37734874c8","pid":7,"error":"error while sending or receiving data through the channel"}

The rest of the logs look ok.

Then after few minutes it prints:

{"msg":">> get_current_time_point","v":0,"name":"mithril-signer","level":20,"time":"2024-11-12T15:23:50.498065409Z","hostname":"bb37734874c8","pid":7,"src":"SignerRunner"}
{"msg":"An error occurred, runtime state kept. message = 'Could not retrieve current time point in context 'checking if epoch has changed'.'","v":0,"name":"mithril-signer","level":50,"time":"2024-11-12T15:23:50.498093389Z","hostname":"bb37734874c8","pid":7,"src":"StateMachine","nested_error":"Runner can not get current time point\n\nCaused by:\n    0: TimePoint Provider can not get current epoch\n    1: general error\n    2: PallasChainObserver failed to create new client\n    3: error connecting bearer\n    4: No such file or directory (os error 2)"}
{"msg":"… Cycle finished, Sleeping for 60000 ms","v":0,"name":"mithril-signer","level":30,"time":"2024-11-12T15:23:50.498101915Z","hostname":"bb37734874c8","pid":7,"src":"StateMachine"}

EDITED: After few cycles, the no such file or directory message seems to resolve itself.

jpraynaud commented 1 day ago

Thanks @mrabdibdi-anvil for providing these information.

It looks like the configuration of the Cardano node socket is not correctly setup for the signer node:

Also, I recommend that you rollback the previous store (and restart the signer node) as this is not the source of the problem and starting with a fresh new store means you will have to wait 2 epochs before being able to sign.

mrabdibdi-anvil commented 1 day ago

Perfect ! @jpraynaud

I restarted using the original files I had backup.

Now I get the original error message:

{"msg":"New cycle: ReadyToSign - 521","v":0,"name":"mithril-signer","level":30,"time":"2024-11-12T16:06:59.218865546Z","hostname":"bb37734874c8","pid":7,"src":"StateMachine"}
{"msg":"Incrementing 'mithril_signer_runtime_cycle_total_since_startup' counter","v":0,"name":"mithril-signer","level":20,"time":"2024-11-12T16:06:59.218878598Z","hostname":"bb37734874c8","pid":7,"src":"MetricsService"}
{"msg":">> get_current_time_point","v":0,"name":"mithril-signer","level":20,"time":"2024-11-12T16:06:59.218891904Z","hostname":"bb37734874c8","pid":7,"src":"SignerRunner"}
{"msg":">> get_beacon_to_sign(time_point: TimePoint (epoch: 521, immutable_file_number: 6471, chain_point: ChainPoint (slot_number: 139861287, block_number: 11084953, block_hash: 3a8aab436f02dba0ff75064889fd86eeec719341145d1f8130239bf8c0835564)))","v":0,"name":"mithril-signer","level":20,"time":"2024-11-12T16:06:59.246423971Z","hostname":"bb37734874c8","pid":7,"src":"SignerRunner"}
{"msg":"→ Epoch has NOT changed we can sign this beacon, transiting to ReadyToSign","v":0,"name":"mithril-signer","level":30,"time":"2024-11-12T16:06:59.246550089Z","hostname":"bb37734874c8","pid":7,"src":"StateMachine","beacon_to_sign":"BeaconToSign { epoch: Epoch(521), signed_entity_type: CardanoTransactions(Epoch(521), BlockNumber(11084849)), initiated_at: 2024-11-12T16:06:59.246513350Z }"}
{"msg":">> transition_from_ready_to_sign_to_ready_to_sign","v":0,"name":"mithril-signer","level":20,"time":"2024-11-12T16:06:59.246567487Z","hostname":"bb37734874c8","pid":7,"src":"StateMachine","next_retrieval_epoch":"Epoch(521)","retrieval_epoch":"Epoch(520)","current_epoch":"Epoch(521)"}
{"msg":"Incrementing 'mithril_signer_signature_registration_total_since_startup' counter","v":0,"name":"mithril-signer","level":20,"time":"2024-11-12T16:06:59.246575538Z","hostname":"bb37734874c8","pid":7,"src":"MetricsService"}
{"msg":">> compute_message(CardanoTransactions(Epoch(521), BlockNumber(11084849)))","v":0,"name":"mithril-signer","level":20,"time":"2024-11-12T16:06:59.246583238Z","hostname":"bb37734874c8","pid":7,"src":"SignerRunner"}
{"msg":"Compute protocol message for signed entity type: 'CardanoTransactions(Epoch(521), BlockNumber(11084849))'","v":0,"name":"mithril-signer","level":20,"time":"2024-11-12T16:06:59.246591406Z","hostname":"bb37734874c8","pid":7,"src":"MithrilSignableBuilderService"}
{"msg":"Running Transactions import between block '11029829' and '11031329'","v":0,"name":"mithril-signer","level":20,"time":"2024-11-12T16:06:59.246598662Z","hostname":"bb37734874c8","pid":7,"src":"TransactionsImporterByChunk"}
{"msg":"No last polled point available, falling back to the highest stored chain point","v":0,"name":"mithril-signer","level":20,"time":"2024-11-12T16:06:59.246720062Z","hostname":"bb37734874c8","pid":7,"src":"CardanoTransactionsImporter"}
{"msg":"Retrieving Cardano transactions until block numbered '11031329'","v":0,"name":"mithril-signer","level":20,"time":"2024-11-12T16:06:59.246727891Z","hostname":"bb37734874c8","pid":7,"src":"CardanoTransactionsImporter","highest_stored_block_number":"Some(BlockNumber(11029829))","starting_slot_number":"Some(SlotNumber(138727583))"}
{"msg":"Doesn't have agency, no need to find intersect point","v":0,"name":"mithril-signer","level":20,"time":"2024-11-12T16:06:59.246735786Z","hostname":"bb37734874c8","pid":7,"src":"PallasChainReader"}
{"msg":">> poll_next","v":0,"name":"mithril-signer","level":20,"time":"2024-11-12T16:06:59.24674298Z","hostname":"bb37734874c8","pid":7,"src":"ChainReaderBlockStreamer"}
{"msg":"An error occurred, runtime state kept. message = 'Could not compute message during 'ready to sign → ready to sign' phase (current epoch Epoch(521))'","v":0,"name":"mithril-signer","level":50,"time":"2024-11-12T16:06:59.246750616Z","hostname":"bb37734874c8","pid":7,"src":"StateMachine","nested_error":"Runner can not compute protocol message for signed entity type: 'CardanoTransactions'\n\nCaused by:\n    0: Signable builder service can not compute protocol message with block_number: '11084849'\n    1: error while sending or receiving data through the channel"}
{"msg":"… Cycle finished, Sleeping for 60000 ms","v":0,"name":"mithril-signer","level":30,"time":"2024-11-12T16:06:59.246758785Z","hostname":"bb37734874c8","pid":7,"src":"StateMachine"}

These two commands work as well:

$CARDANO_CLI_PATH query tip --socket-path $CARDANO_NODE_SOCKET_PATH --mainnet
$CARDANO_CLI_PATH query tip --socket-path /cardano/ipc/node.socket --mainnet 
jpraynaud commented 1 day ago

OK, I can see that the signer has now access to the Cardano node through the Unix socket as the state is ReadyToSign at epoch `521👍

The error message that you have is now different from ythe first one. The Cardano transactions are being imported which explains that the signer is not able yet to sign this type of message. This is probably due to the previous version of the node you were running (can you share the version?). The process will likely take few hours (as explained in this doc: https://mithril.network/doc/manual/operate/run-signer-node#cardano-transactions-certification-footprint).

Let me know if you still have error messages in the next 24 hours 🙂

mrabdibdi-anvil commented 1 day ago

Thanks for your time and responses

I upgraded the cardano node from 9.2.1 to 10.1.2 last week, and also updated the operational certificate (node.cert) (still in the 2 epochs waiting time)

The linked documentation is the one I followed to setup the Dockerfile, I have been running like this for a while without issue.

Do I have actions to do or just wait 24h to see if it is resolved ?

jpraynaud commented 1 day ago

OK, can you also share the previous version of the signer you were running?

If there is no more error logs in 24 hours you can close the issue. Otherwise, feel free to post them 👍

mrabdibdi-anvil commented 1 day ago

oh !

mithril-signer version 2437.1

Perfect will do !

jpraynaud commented 14 hours ago

@mrabdibdi-anvil I close this issue. Feel free to reopen if necessary 🙂

mrabdibdi-anvil commented 11 hours ago

Hello ! @jpraynaud

I let it runs for about 24h

The issue is persisting

{"msg":"New cycle: ReadyToSign - 521","v":0,"name":"mithril-signer","level":30,"time":"2024-11-13T19:38:42.583883523Z","hostname":"bb37734874c8","pid":7,"src":"StateMachine"}
{"msg":"Incrementing 'mithril_signer_runtime_cycle_total_since_startup' counter","v":0,"name":"mithril-signer","level":20,"time":"2024-11-13T19:38:42.58389059Z","hostname":"bb37734874c8","pid":7,"src":"MetricsService"}
{"msg":">> get_current_time_point","v":0,"name":"mithril-signer","level":20,"time":"2024-11-13T19:38:42.58390027Z","hostname":"bb37734874c8","pid":7,"src":"SignerRunner"}
{"msg":">> get_beacon_to_sign(time_point: TimePoint (epoch: 521, immutable_file_number: 6476, chain_point: ChainPoint (slot_number: 139960427, block_number: 11089810, block_hash: 628394bac8bdce9fa5dc982d6aada67e0ea1e59de22162ce46450df52d006263)))","v":0,"name":"mithril-signer","level":20,"time":"2024-11-13T19:38:42.611903377Z","hostname":"bb37734874c8","pid":7,"src":"SignerRunner"}
{"msg":"→ Epoch has NOT changed we can sign this beacon, transiting to ReadyToSign","v":0,"name":"mithril-signer","level":30,"time":"2024-11-13T19:38:42.612039669Z","hostname":"bb37734874c8","pid":7,"src":"StateMachine","beacon_to_sign":"BeaconToSign { epoch: Epoch(521), signed_entity_type: CardanoTransactions(Epoch(521), BlockNumber(11089709)), initiated_at: 2024-11-13T19:38:42.612017241Z }"}
{"msg":">> transition_from_ready_to_sign_to_ready_to_sign","v":0,"name":"mithril-signer","level":20,"time":"2024-11-13T19:38:42.6120487Z","hostname":"bb37734874c8","pid":7,"src":"StateMachine","next_retrieval_epoch":"Epoch(521)","retrieval_epoch":"Epoch(520)","current_epoch":"Epoch(521)"}
{"msg":"Incrementing 'mithril_signer_signature_registration_total_since_startup' counter","v":0,"name":"mithril-signer","level":20,"time":"2024-11-13T19:38:42.612065406Z","hostname":"bb37734874c8","pid":7,"src":"MetricsService"}
{"msg":">> compute_message(CardanoTransactions(Epoch(521), BlockNumber(11089709)))","v":0,"name":"mithril-signer","level":20,"time":"2024-11-13T19:38:42.612072231Z","hostname":"bb37734874c8","pid":7,"src":"SignerRunner"}
{"msg":"Compute protocol message for signed entity type: 'CardanoTransactions(Epoch(521), BlockNumber(11089709))'","v":0,"name":"mithril-signer","level":20,"time":"2024-11-13T19:38:42.612079117Z","hostname":"bb37734874c8","pid":7,"src":"MithrilSignableBuilderService"}
{"msg":"Running Transactions import between block '11029829' and '11031329'","v":0,"name":"mithril-signer","level":20,"time":"2024-11-13T19:38:42.612100643Z","hostname":"bb37734874c8","pid":7,"src":"TransactionsImporterByChunk"}
{"msg":"No last polled point available, falling back to the highest stored chain point","v":0,"name":"mithril-signer","level":20,"time":"2024-11-13T19:38:42.612247915Z","hostname":"bb37734874c8","pid":7,"src":"CardanoTransactionsImporter"}
{"msg":"Retrieving Cardano transactions until block numbered '11031329'","v":0,"name":"mithril-signer","level":20,"time":"2024-11-13T19:38:42.612260282Z","hostname":"bb37734874c8","pid":7,"src":"CardanoTransactionsImporter","highest_stored_block_number":"Some(BlockNumber(11029829))","starting_slot_number":"Some(SlotNumber(138727583))"}
{"msg":"Doesn't have agency, no need to find intersect point","v":0,"name":"mithril-signer","level":20,"time":"2024-11-13T19:38:42.612270024Z","hostname":"bb37734874c8","pid":7,"src":"PallasChainReader"}
{"msg":">> poll_next","v":0,"name":"mithril-signer","level":20,"time":"2024-11-13T19:38:42.61227732Z","hostname":"bb37734874c8","pid":7,"src":"ChainReaderBlockStreamer"}
{"msg":"An error occurred, runtime state kept. message = 'Could not compute message during 'ready to sign → ready to sign' phase (current epoch Epoch(521))'","v":0,"name":"mithril-signer","level":50,"time":"2024-11-13T19:38:42.612284807Z","hostname":"bb37734874c8","pid":7,"src":"StateMachine","nested_error":"Runner can not compute protocol message for signed entity type: 'CardanoTransactions'\n\nCaused by:\n    0: Signable builder service can not compute protocol message with block_number: '11089709'\n    1: error while sending or receiving data through the channel"}
{"msg":"… Cycle finished, Sleeping for 60000 ms","v":0,"name":"mithril-signer","level":30,"time":"2024-11-13T19:38:42.612292969Z","hostname":"bb37734874c8","pid":7,"src":"StateMachine"}