Mainnet stalled at 34793621

stellar / stellar-core

Reference implementation for the peer-to-peer agent that manages the Stellar network.

https://www.stellar.org

Other

3.12k stars 968 forks source link

Mainnet stalled at 34793621 #2997

Closed bert2002 closed 3 years ago

bert2002 commented 3 years ago

Hi, seems that mainnet is stalleing on 34793621. Our node stopped syncing blocks and all explorers I checked so far stopped at the same block.

Apr 06 09:14:49 stellar-1 stellar-core[7836]: 2021-04-06T09:14:49.058 GD2MP [Overlay INFO] Connected to 144.76.113.106:11625
Apr 06 09:14:49 stellar-1 stellar-core[7836]: 2021-04-06T09:14:49.230 GD2MP [Overlay INFO] Peer 144.76.113.106:11625 dropped us, reason ERR_LOAD (peer rejected)
Apr 06 09:14:49 stellar-1 stellar-core[7836]: 2021-04-06T09:14:49.258 GD2MP [Herder INFO] Asking peers for SCP messages more recent than 34793619
Apr 06 09:14:49 stellar-1 stellar-core[7836]: 2021-04-06T09:14:49.772 GD2MP [Overlay INFO] Connected to 51.77.119.45:11625
Apr 06 09:14:57 stellar-1 stellar-core[7836]: 2021-04-06T09:14:57.729 GD2MP [Overlay INFO] Connected to 213.239.204.70:11625
Apr 06 09:14:57 stellar-1 stellar-core[7836]: 2021-04-06T09:14:57.907 GD2MP [Overlay INFO] Peer 213.239.204.70:11625 dropped us, reason ERR_LOAD (peer rejected)
Apr 06 09:14:59 stellar-1 stellar-core[7836]: 2021-04-06T09:14:59.258 GD2MP [Herder INFO] Asking peers for SCP messages more recent than 34793619
Apr 06 09:14:59 stellar-1 stellar-core[7836]: 2021-04-06T09:14:59.837 GD2MP [Overlay INFO] Connected to 34.244.44.232:11625
Apr 06 09:15:00 stellar-1 stellar-core[7836]: 2021-04-06T09:15:00.753 GD2MP [Overlay INFO] Connected to 213.239.204.70:11625
Apr 06 09:15:00 stellar-1 stellar-core[7836]: 2021-04-06T09:15:00.926 GD2MP [Overlay INFO] Peer 213.239.204.70:11625 dropped us, reason ERR_LOAD (peer rejected)
Apr 06 09:15:03 stellar-1 stellar-core[7836]: 2021-04-06T09:15:03.785 GD2MP [Overlay INFO] Dropping peer 34.244.44.232:11625, reason error reading message header: End of file
Apr 06 09:15:04 stellar-1 stellar-core[7836]: 2021-04-06T09:15:04.143 GD2MP [Overlay INFO] Connected to 144.76.113.106:11625
Apr 06 09:15:04 stellar-1 stellar-core[7836]: 2021-04-06T09:15:04.315 GD2MP [Overlay INFO] Peer 144.76.113.106:11625 dropped us, reason ERR_LOAD (peer rejected)
Apr 06 09:15:09 stellar-1 stellar-core[7836]: 2021-04-06T09:15:09.258 GD2MP [Herder INFO] Asking peers for SCP messages more recent than 34793619
Apr 06 09:15:15 stellar-1 stellar-core[7836]: 2021-04-06T09:15:15.893 GD2MP [Overlay INFO] Connected to 213.239.204.70:11625
Apr 06 09:15:16 stellar-1 stellar-core[7836]: 2021-04-06T09:15:16.053 GD2MP [Overlay INFO] Peer 213.239.204.70:11625 dropped us, reason ERR_LOAD (peer rejected)
Apr 06 09:15:19 stellar-1 stellar-core[7836]: 2021-04-06T09:15:19.258 GD2MP [Herder INFO] Asking peers for SCP messages more recent than 34793619
Apr 06 09:15:21 stellar-1 stellar-core[7836]: 2021-04-06T09:15:21.610 GD2MP [Overlay INFO] Dropping peer 35.172.165.110:11625, reason error reading message header: Connection reset by peer
Apr 06 09:15:22 stellar-1 stellar-core[7836]: 2021-04-06T09:15:22.232 GD2MP [Overlay INFO] Connected to 144.76.113.106:11625
Apr 06 09:15:22 stellar-1 stellar-core[7836]: 2021-04-06T09:15:22.390 GD2MP [Overlay INFO] Peer 144.76.113.106:11625 dropped us, reason ERR_LOAD (peer rejected)

0x4a5e1e4baab commented 3 years ago

Seeing the same issue on several nodes

riptl commented 3 years ago

Blockdaemon mainnet validators are at block number 34794585 atm, which is 964 blocks ahead.

gituser commented 3 years ago

It's also the case here: https://horizon.stellar.org/ And here as well: https://stellarchain.io/

Ping @MonsieurNicolas

juno-yu commented 3 years ago

FYI, major explorers all stuck at 34793621 , including https://stellarchain.io/ https://steexp.com/ https://stellar.expert/explorer/public https://dashboard.stellar.org/ https://stellarscan.io/

blockchair is moving at the moment https://blockchair.com/stellar (34796357 | 2021-04-06 13:10)

we are still running an old latest tag image with 1.11.1-91 HORIZON & 15.0.0-40 CORE version , and it is stuck at the same block (34793621)

I guess 15.X.0 -> 15.4.0 might have consensus affecting changes in between, but was not intended / given enough attention to urge all node hosts to upgrade

Maybe there are some people like us, host stellar nodes with the quickstart image, there is another blocker for upgrading to newest image on 15.4.0. Due to the postgres 9.5 -> 12 change together introduced here it requires pg_upgrade over the data which could be >2TB, it could be a non-trivial process to figure out, so maybe not everyone upgraded yet, including some validators

For now I could not checkout old latest images to examine which combination / commit it would start to stuck , because all were reusing latest tag, and I don't know the old hashes.

gituser commented 3 years ago

@juno-yu

https://horizon.stellar.org/ is stuck as well, but it's running stellar-core v15.4.0 and almost latest horizon 2.0.0-f23d046f5c509065cc6bafe6794a89ddac332186

These are the versions of stellar-core that are stuck currently:

v15.1.0
v15.2.0
v15.3.0
v15.4.0

Would be interesting to know what version blockchair is on and are they validating new blocks?

mvaneijk commented 3 years ago

stellar consensus protocol should favor safety over liveness, so a split sounds highly unlikely to me

tryexceptend commented 3 years ago

Is there any solution to the problem?

jun0tpyrc commented 3 years ago

I see major explorers already recovered , but our nodes stuck for a few hours like

2021-04-07T02:02:40.587 GASYY [History ERROR] Bad ledger-header history entry: claimed ledger [seq=34793621, hash=d05842] does not agree with LCL [seq=34793621, hash=599be9] [VerifyLedgerChainWork.cpp:185]
2021-04-07T02:02:40.587 GASYY [History ERROR] Catchup material failed verification - hash mismatch, propagating failure [VerifyLedgerChainWork.cpp:443]
2021-04-07T02:02:40.587 GASYY [History ERROR] One or more of history archives may be corrupted. Update HISTORY configuration entry to only contain valid ones [VerifyLedgerChainWork.cpp:445]

anyone knows the steps to proceed / recover without starting from scratch?

bert2002 commented 3 years ago

I updated our nodes to 15.4.0 and horizon to 2.1.0, but horizon is still stuck. We have captive mode disabled. Could this cause any problems?

astudnev commented 3 years ago

We are running 15.4 and have the following log:

Apr 07 10:48:05 node1-1 stellar-core[6279]: 2021-04-07T10:48:05.269 GDGMH [Ledger INFO] Got consensus: [seq=34810183, prev=0552fc, txs=159, ops=272, sv: [ SIGNED@GBBQQ txH: 64ed88, ct: 1617792484, upgrades: [ ] ]]
Apr 07 10:48:05 node1-1 stellar-core[6279]: 2021-04-07T10:48:05.269 GDGMH [Ledger INFO] Close of ledger 34810183 buffered
Apr 07 10:48:10 node1-1 stellar-core[6279]: 2021-04-07T10:48:10.672 GDGMH [Herder INFO] Quorum information for 34810182 : {"agree":5,"cost":29233932,"delayed":0,"disagree":0,"fail_at":1,"hash":"f34a6d","lag_ms":481,"ledger":34810182,"missing":2,"phase":"EXTERNALIZE"}
Apr 07 10:48:10 node1-1 stellar-core[6279]: 2021-04-07T10:48:10.674 GDGMH [Ledger INFO] Got consensus: [seq=34810184, prev=403620, txs=156, ops=267, sv: [ SIGNED@sdf_watcher3 txH: d22f93, ct: 1617792489, upgrades: [ ] ]]
Apr 07 10:48:10 node1-1 stellar-core[6279]: 2021-04-07T10:48:10.674 GDGMH [Ledger INFO] Close of ledger 34810184 buffered
Apr 07 10:48:15 node1-1 stellar-core[6279]: 2021-04-07T10:48:15.878 GDGMH [Herder INFO] Quorum information for 34810183 : {"agree":5,"cost":32404296,"delayed":0,"disagree":0,"fail_at":1,"hash":"f34a6d","lag_ms":478,"ledger":34810183,"missing":2,"phase":"EXTERNALIZE"}
Apr 07 10:48:15 node1-1 stellar-core[6279]: 2021-04-07T10:48:15.881 GDGMH [Ledger INFO] Got consensus: [seq=34810185, prev=d20db2, txs=164, ops=258, sv: [ SIGNED@sdf_watcher1 txH: 7c2a2c, ct: 1617792495, upgrades: [ ] ]]
Apr 07 10:48:15 node1-1 stellar-core[6279]: 2021-04-07T10:48:15.881 GDGMH [Ledger INFO] Close of ledger 34810185 buffered
Apr 07 10:48:16 node1-1 stellar-core[6279]: 2021-04-07T10:48:16.640 GDGMH [Process WARNING] process 7711 exited 22: curl -sf http://history.stellar.org/prd/core-live/core_live_002/ledger/02/12/e8/ledger-0212e8bf.xdr.gz -o /var/lib/stellar/buckets/tmp/catchup-0f5755afa66f2c5e/ledger/02/1
Apr 07 10:48:16 node1-1 stellar-core[6279]: 2021-04-07T10:48:16.640 GDGMH [History ERROR] Could not download file: archive sdf2 maybe missing file ledger/02/12/e8/ledger-0212e8bf.xdr.gz

horizon is not syncing

Removing database for core and horizon did not help. Core still shows the status:

stellar-core http-command info --conf /etc/stellar/stellar-core.cfg
2021-04-07T11:09:43.518 [default INFO] Config from /etc/stellar/stellar-core.cfg
2021-04-07T11:09:43.518 [default INFO] Using QUORUM_SET: {
   "t" : 3,
   "v" : [
      "sdf_watcher1",
      "sdf_watcher2",
      "sdf_watcher3",
      {
         "t" : 3,
         "v" : [ "stronghold1", "eno", "tempo.eu.com", "satoshipay" ]
      }
   ]
}

Content-Length: 1463
Content-Type: application/json

2021-04-07T11:09:43.518 GA7CF [default INFO] {
   "info" : {
      "build" : "stellar-core 15.4.0 (ff56e7f207b918aae53710abcc001511ead7ece9)",
      "history_failure_rate" : "0.0032",
      "ledger" : {
         "age" : 1617793783,
         "baseFee" : 100,
         "baseReserve" : 100000000,
         "closeTime" : 0,
         "hash" : "39c2a3cd4141b2853e70d84601faa44744660334b48f3228e0309342e3f4eb48",
         "maxTxSetSize" : 100,
         "num" : 1,
         "version" : 0
      },
      "network" : "Public Global Stellar Network ; September 2015",
      "peers" : {
         "authenticated_count" : 8,
         "pending_count" : 0
      },
      "protocol_version" : 15,
      "quorum" : {
         "node" : "GCL67",
         "qset" : {
            "agree" : 5,
            "cost" : 33200880,
            "delayed" : 0,
            "disagree" : 0,
            "fail_at" : 1,
            "hash" : "f34a6d",
            "lag_ms" : 458,
            "ledger" : 34810420,
            "missing" : 2,
            "phase" : "EXTERNALIZE"
         },
         "transitive" : {
            "critical" : [
               [ "sdf_watcher3", "sdf_watcher1", "sdf_watcher2" ]
            ],
            "intersection" : true,
            "last_check_ledger" : 34810379,
            "node_count" : 30
         }
      },
      "startedOn" : "2021-04-07T11:05:38Z",
      "state" : "Catching up",
      "status" : [
         "Catching up to ledger 34810367: downloading ledger files 2509/2509 (100%)"
      ]
   }
}

Config contains:

NODE_NAMES=[
"GDIQKLQVOCD5UD6MUI5D5PTPVX7WTP5TAPP5OBMOLENBBD5KG434KYQ2  stronghold1",
"GAOO3LWBC4XF6VWRP5ESJ6IBHAISVJMSBTALHOQM2EZG7Q477UWA6L7U  eno",
"GCJCSMSPIWKKPR7WEPIQG63PDF7JGGEENRC33OKVBSPUDIRL6ZZ5M7OO  tempo.eu.com",
"GC5SXLNAM3C4NMGK2PXK4R34B5GNZ47FYQ24ZIBFDFOCU6D4KBN4POAE  satoshipay",
"GD7FVHL2KUTUYNOJFRUUDJPDRO2MAZJ5KP6EBCU6LKXHYGZDUFBNHXQI  umbrel",
"GCGB2S2KGYARPVIA37HYZXVRM2YZUEXA6S33ZU5BUDC6THSB62LZSTYH  sdf_watcher1",
"GCM6QMP3DLRPTAZW2UZPCPX2LF3SXWXKPMP3GKFZBDSF3QZGV2G5QSTK  sdf_watcher2",
"GABMKJM6I25XI4K7U6XWMULOUQIQ27BCTMLS6BYYSOWKTBUXVRJSXHYQ  sdf_watcher3",
]

KNOWN_PEERS=[
"core-live-a.stellar.org:11625",
"core-live-b.stellar.org:11625",
"core-live-c.stellar.org:11625",
"validator1.stellar.stronghold.co",
"stellar.256kw.com",
"stellar1.tempo.eu.com",
"stellar.satoshipay.io"
]

gituser commented 3 years ago

@astudnev you need to adjust your configuration to more recent one and add more validators.

Check this configuration - https://github.com/stellar/docker-stellar-core-horizon/blob/master/pubnet/core/etc/stellar-core.cfg

In particular you need to remove NODE_NAMES, KNOWN_PEERS, QUORUM_SET directives and replace them with new [[VALIDATORS]] and [[HOME_DOMAINS]].

Also, to make sure your horizon is synced set HISTORY_RETENTION_COUNT to at least 50000 and CATCHUP_RECENT in stellar-core to 50000 or more.

@bert2002 most likely you need to do the same as I suggested to @astudnev add more validators into your stellar-core.cfg and then horizon should work fine.

I'm trying also to convert my another instance into the new horizon syncing method (CAPTIVE_CORE) but it seems it's very slow catching up and trying to re-download same ledger and re-apply it over and over again for some reason.

haruki515 commented 3 years ago

our nodes stuck for a few hours, we have following logs

2021-04-07T09:23:41.331 GAHYQ [Overlay INFO] Non preferred outbound authenticated peer 70.228.73.25:11625 rejected be
cause all available slots are taken.
2021-04-07T09:23:41.331 GAHYQ [Overlay INFO] If you wish to allow for more outbound connections, please update your c
onfiguration file

maybe, Is it because there are so few validators running that everyone is trying to sync at same time?

our node configuration is based on this

Check this configuration - https://github.com/stellar/docker-stellar-core-horizon/blob/master/pubnet/core/etc/stellar-core.cfg

is there a list of other validators that could be added?

gituser commented 3 years ago

@haruki515

our nodes stuck for a few hours, we have following logs

2021-04-07T09:23:41.331 GAHYQ [Overlay INFO] Non preferred outbound authenticated peer 70.228.73.25:11625 rejected be
cause all available slots are taken.
2021-04-07T09:23:41.331 GAHYQ [Overlay INFO] If you wish to allow for more outbound connections, please update your c
onfiguration file

Could be that but could be also that your outbound port is not open (11626 by default), try fixing your firewall and forwarding this port so it's available to connect to.

maybe, Is it because there are so few validators running that everyone is trying to sync at same time?

Could be. Try using latest v15.4.0 and restarting your stellar-core.

is there a list of other validators that could be added?

I don't think so there are more, although there is another resource: https://stellarbeat.io/

and they offer stellar-core.cfg generated with list of stellar validators (click on the stellar core config on the left).

stellar-core.cfg from stellarbeat.io (click to expand)

``` [[HOME_DOMAINS]] HOME_DOMAIN = "lobstr.co" QUALITY = "HIGH" [[HOME_DOMAINS]] HOME_DOMAIN = "keybase.io" QUALITY = "HIGH" [[HOME_DOMAINS]] HOME_DOMAIN = "satoshipay.io" QUALITY = "HIGH" [[HOME_DOMAINS]] HOME_DOMAIN = "coinqvest.com" QUALITY = "HIGH" [[HOME_DOMAINS]] HOME_DOMAIN = "wirexapp.com" QUALITY = "HIGH" [[HOME_DOMAINS]] HOME_DOMAIN = "www.stellar.org" QUALITY = "HIGH" [[HOME_DOMAINS]] HOME_DOMAIN = "stellar.blockdaemon.com" QUALITY = "HIGH" [[VALIDATORS]] NAME = "LOBSTR 3 (North America)" PUBLIC_KEY = "GD5QWEVV4GZZTQP46BRXV5CUMMMLP4JTGFD7FWYJJWRL54CELY6JGQ63" ADDRESS = "35.239.138.233:11625" HISTORY = "curl -sf https://stellar-archive-3-lobstr.s3.amazonaws.com/ -o {1}" HOME_DOMAIN = "lobstr.co" [[VALIDATORS]] NAME = "LOBSTR 1 (Europe)" PUBLIC_KEY = "GCFONE23AB7Y6C5YZOMKUKGETPIAJA4QOYLS5VNS4JHBGKRZCPYHDLW7" ADDRESS = "88.99.1.97:11625" HISTORY = "curl -sf https://stellar-archive-1-lobstr.s3.amazonaws.com/ -o {1}" HOME_DOMAIN = "lobstr.co" [[VALIDATORS]] NAME = "LOBSTR 4 (Asia)" PUBLIC_KEY = "GA7TEPCBDQKI7JQLQ34ZURRMK44DVYCIGVXQQWNSWAEQR6KB4FMCBT7J" ADDRESS = "34.92.213.91:11625" HISTORY = "curl -sf https://stellar-archive-4-lobstr.s3.amazonaws.com/ -o {1}" HOME_DOMAIN = "lobstr.co" [[VALIDATORS]] NAME = "LOBSTR 5 (Australia)" PUBLIC_KEY = "GA5STBMV6QDXFDGD62MEHLLHZTPDI77U3PFOD2SELU5RJDHQWBR5NNK7" ADDRESS = "35.189.42.104:11625" HISTORY = "curl -sf https://stellar-archive-5-lobstr.s3.amazonaws.com/ -o {1}" HOME_DOMAIN = "lobstr.co" [[VALIDATORS]] NAME = "Keybase 1" PUBLIC_KEY = "GDKWELGJURRKXECG3HHFHXMRX64YWQPUHKCVRESOX3E5PM6DM4YXLZJM" ADDRESS = "54.187.137.83:11625" HISTORY = "curl -sf https://stellarhistory1.keybase.io -o {1}" HOME_DOMAIN = "keybase.io" [[VALIDATORS]] NAME = "Keybase 0" PUBLIC_KEY = "GCWJKM4EGTGJUVSWUJDPCQEOEP5LHSOFKSA4HALBTOO4T4H3HCHOM6UX" ADDRESS = "54.224.232.179:11625" HISTORY = "curl -sf https://stellarhistory.keybase.io -o {1}" HOME_DOMAIN = "keybase.io" [[VALIDATORS]] NAME = "Keybase 2" PUBLIC_KEY = "GA35T3723UP2XJLC2H7MNL6VMKZZIFL2VW7XHMFFJKKIA2FJCYTLKFBW" ADDRESS = "3.120.145.172:11625" HISTORY = "curl -sf https://stellarhistory2.keybase.io -o {1}" HOME_DOMAIN = "keybase.io" [[VALIDATORS]] NAME = "SatoshiPay (DE, Frankfurt)" PUBLIC_KEY = "GC5SXLNAM3C4NMGK2PXK4R34B5GNZ47FYQ24ZIBFDFOCU6D4KBN4POAE" ADDRESS = "51.195.6.154:11625" HISTORY = "curl -sf https://stellar-history-de-fra.satoshipay.io -o {1}" HOME_DOMAIN = "satoshipay.io" [[VALIDATORS]] NAME = "SatoshiPay (SG, Singapore)" PUBLIC_KEY = "GBJQUIXUO4XSNPAUT6ODLZUJRV2NPXYASKUBY4G5MYP3M47PCVI55MNT" ADDRESS = "35.247.129.227:11625" HISTORY = "curl -sf https://stellar-history-sg-sin.satoshipay.io -o {1}" HOME_DOMAIN = "satoshipay.io" [[VALIDATORS]] NAME = "SatoshiPay (US, Iowa)" PUBLIC_KEY = "GAK6Z5UVGUVSEK6PEOCAYJISTT5EJBB34PN3NOLEQG2SUKXRVV2F6HZY" ADDRESS = "149.56.25.35:11625" HISTORY = "curl -sf https://stellar-history-us-iowa.satoshipay.io -o {1}" HOME_DOMAIN = "satoshipay.io" [[VALIDATORS]] NAME = "COINQVEST (Germany)" PUBLIC_KEY = "GD6SZQV3WEJUH352NTVLKEV2JM2RH266VPEM7EH5QLLI7ZZAALMLNUVN" ADDRESS = "94.130.216.168:11625" HISTORY = "curl -sf https://germany.stellar.coinqvest.com/history/ -o {1}" HOME_DOMAIN = "coinqvest.com" [[VALIDATORS]] NAME = "COINQVEST (Hong Kong)" PUBLIC_KEY = "GAZ437J46SCFPZEDLVGDMKZPLFO77XJ4QVAURSJVRZK2T5S7XUFHXI2Z" ADDRESS = "95.216.67.199:11625" HISTORY = "curl -sf https://hongkong.stellar.coinqvest.com/history/ -o {1}" HOME_DOMAIN = "coinqvest.com" [[VALIDATORS]] NAME = "COINQVEST (Finland)" PUBLIC_KEY = "GADLA6BJK6VK33EM2IDQM37L5KGVCY5MSHSHVJA4SCNGNUIEOTCR6J5T" ADDRESS = "95.216.29.91:11625" HISTORY = "curl -sf https://finland.stellar.coinqvest.com/history/ -o {1}" HOME_DOMAIN = "coinqvest.com" [[VALIDATORS]] NAME = "Wirex United States" PUBLIC_KEY = "GDXUKFGG76WJC7ACEH3JUPLKM5N5S76QSMNDBONREUXPCZYVPOLFWXUS" ADDRESS = "52.158.209.165:11625" HISTORY = "curl -sf http://wxhorizonusstga1.blob.core.windows.net/history/ -o {1}" HOME_DOMAIN = "wirexapp.com" [[VALIDATORS]] NAME = "Wirex United Kingdom" PUBLIC_KEY = "GBBQQT3EIUSXRJC6TGUCGVA3FVPXVZLGG3OJYACWBEWYBHU46WJLWXEU" ADDRESS = "51.145.121.203:11625" HISTORY = "curl -sf http://wxhorizonukstga1.blob.core.windows.net/history/ -o {1}" HOME_DOMAIN = "wirexapp.com" [[VALIDATORS]] NAME = "Wirex Singapore" PUBLIC_KEY = "GAB3GZIE6XAYWXGZUDM4GMFFLJBFMLE2JDPUCWUZXMOMT3NHXDHEWXAS" ADDRESS = "40.119.214.22:11625" HISTORY = "curl -sf http://wxhorizonasiastga1.blob.core.windows.net/history/ -o {1}" HOME_DOMAIN = "wirexapp.com" [[VALIDATORS]] NAME = "SDF 2" PUBLIC_KEY = "GCM6QMP3DLRPTAZW2UZPCPX2LF3SXWXKPMP3GKFZBDSF3QZGV2G5QSTK" ADDRESS = "18.206.55.205:11625" HISTORY = "curl -sf http://history.stellar.org/prd/core-live/core_live_002/ -o {1}" HOME_DOMAIN = "www.stellar.org" [[VALIDATORS]] NAME = "SDF 1" PUBLIC_KEY = "GCGB2S2KGYARPVIA37HYZXVRM2YZUEXA6S33ZU5BUDC6THSB62LZSTYH" ADDRESS = "3.81.86.144:11625" HISTORY = "curl -sf http://history.stellar.org/prd/core-live/core_live_001/ -o {1}" HOME_DOMAIN = "www.stellar.org" [[VALIDATORS]] NAME = "SDF 3" PUBLIC_KEY = "GABMKJM6I25XI4K7U6XWMULOUQIQ27BCTMLS6BYYSOWKTBUXVRJSXHYQ" ADDRESS = "35.168.59.1:11625" HISTORY = "curl -sf http://history.stellar.org/prd/core-live/core_live_003/ -o {1}" HOME_DOMAIN = "www.stellar.org" [[VALIDATORS]] NAME = "Blockdaemon Validator 3" PUBLIC_KEY = "GAYXZ4PZ7P6QOX7EBHPIZXNWY4KCOBYWJCA4WKWRKC7XIUS3UJPT6EZ4" ADDRESS = "34.80.16.150:11625" HISTORY = "curl -sf https://stellar-full-history3.bdnodes.net/ -o {1}" HOME_DOMAIN = "stellar.blockdaemon.com" [[VALIDATORS]] NAME = "Blockdaemon Validator 2" PUBLIC_KEY = "GAVXB7SBJRYHSG6KSQHY74N7JAFRL4PFVZCNWW2ARI6ZEKNBJSMSKW7C" ADDRESS = "34.70.192.72:11625" HISTORY = "curl -sf https://stellar-full-history2.bdnodes.net/ -o {1}" HOME_DOMAIN = "stellar.blockdaemon.com" [[VALIDATORS]] NAME = "Blockdaemon Validator 1" PUBLIC_KEY = "GAAV2GCVFLNN522ORUYFV33E76VPC22E72S75AQ6MBR5V45Z5DWVPWEU" ADDRESS = "35.233.35.143:11625" HISTORY = "curl -sf https://stellar-full-history1.bdnodes.net/ -o {1}" HOME_DOMAIN = "stellar.blockdaemon.com" [[VALIDATORS]] NAME = "LOBSTR 2 (Europe)" PUBLIC_KEY = "GDXQB3OMMQ6MGG43PWFBZWBFKBBDUZIVSUDAZZTRAWQZKES2CDSE5HKJ" ADDRESS = "95.216.1.86:11625" HISTORY = "curl -sf https://stellar-archive-2-lobstr.s3.amazonaws.com/ -o {1}" HOME_DOMAIN = "lobstr.co" ```

haruki515 commented 3 years ago

I got it. Thank you for taking the time to explain this to me.

leevlad commented 3 years ago

Would also like to chime in and mention that a brand new instance of stellar-core based on the configuration below gets stuck: https://github.com/stellar/docker-stellar-core-horizon/blob/master/pubnet/core/etc/stellar-core.cfg

We're seeing the following messages:

2021-04-07T14:59:37.482 GDHC2 [History INFO] Catching up to ledger 34812927: downloading ledger files 302/302 (100%)
2021-04-07T14:59:39.370 GDHC2 [History INFO] Catching up to ledger 34812927: Succeeded: batch-download-ledger-0212e8bf-0213343f : 906/906 children completed
2021-04-07T14:59:39.370 GDHC2 [History INFO] Verifying ledgers [34793622,34812928)
2021-04-07T14:59:39.370 GDHC2 [History INFO] Verifying ledger [seq=34812927, hash=1a38bd] against SCP hash
2021-04-07T14:59:39.370 GDHC2 [History INFO] Catching up to ledger 34812927: verifying checkpoint 1/302 (0%)
2021-04-07T14:59:39.459 GDHC2 [History ERROR] Bad ledger-header history entry: claimed ledger [seq=34793621, hash=d05842] does not agree with LCL [seq=34793621, hash=599be9]
2021-04-07T14:59:39.459 GDHC2 [History ERROR] Catchup material failed verification - hash mismatch, propagating failure
2021-04-07T14:59:39.459 GDHC2 [History ERROR] One or more of history archives may be corrupted. Update HISTORY configuration entry to only contain valid ones
2021-04-07T14:59:39.459 GDHC2 [History INFO] Verifying ledgers [34793622,34812928)
2021-04-07T14:59:39.459 GDHC2 [History INFO] Catching up to ledger 34812927: Failed: verify-ledger-chain

I've tried shuffling some of the validators around in the list, without any luck.

leevlad commented 3 years ago

Additionally, the config from https://stellarbeat.io/ fails to sync too, with the following messages:

2021-04-07T15:01:37.850 GAWMB [History INFO] Downloading history archive state: history/02/13/33/history-021333ff.json
2021-04-07T15:01:41.089 GAWMB [History ERROR] Error loading history state: rapidjson internal assertion failure: IsObject()
2021-04-07T15:01:41.089 GAWMB [History ERROR] There may be a problem with the local filesystem. Ensure that there is enough space to perform that operation and that disc is behaving correctly.
2021-04-07T15:01:41.089 GAWMB [History ERROR] OR
2021-04-07T15:01:41.089 GAWMB [History ERROR] One or more of history archives may be corrupted. Update HISTORY configuration entry to only contain valid ones
2021-04-07T15:01:41.089 GAWMB [History ERROR] OR
2021-04-07T15:01:41.089 GAWMB [History ERROR] Upgrade this stellar-core installation to newest version

edit: definitely not an issue with local FS (/dev/sda 397G 41G 337G 11%)

gituser commented 3 years ago

@leevlad for me this configuration (https://github.com/stellar/docker-stellar-core-horizon/blob/master/pubnet/core/etc/stellar-core.cfg) worked fine with my stuck horizon instance.

Here is what I did:

updated stellar-core to the latest version v16.0.0
added [[VALIDATORS]] and [[HOME]] entries into my stellar-core.cfg, removed old entries like NODE_NAMES, KNOWN_PEERS, QUORUM_SET
restarted stellar-core, it caught up after some time (~1.5 hours)
restarted horizon (I'm using atm v1.14.0) so it also caught up after some time
NOTE: I'm not using new captive feature in horizon

here is my stellar-core.cfg which I'm using right now

``` LOG_FILE_PATH="" DATABASE="postgresql://dbname=stellar user=stellar password=xxx host=127.0.0.1" ENTRY_CACHE_SIZE=8192 PREFETCH_BATCH_SIZE=2000 HTTP_PORT=11626 PUBLIC_HTTP_PORT=false HTTP_MAX_CLIENT=128 COMMANDS=[ "ll?level=info&partition=Herder" ] NETWORK_PASSPHRASE="Public Global Stellar Network ; September 2015" PEER_PORT=11625 TARGET_PEER_CONNECTIONS=8 MAX_ADDITIONAL_PEER_CONNECTIONS=-1 MAX_PENDING_CONNECTIONS=5000 PEER_AUTHENTICATION_TIMEOUT=2 PEER_TIMEOUT=30 PREFERRED_PEERS=["127.0.0.1:7000","127.0.0.1:8000"] PREFERRED_PEER_KEYS=[ ] PREFERRED_PEERS_ONLY=false MINIMUM_IDLE_PERCENT=0 KNOWN_CURSORS=["HORIZON"] NODE_IS_VALIDATOR=false FAILURE_SAFETY=1 UNSAFE_QUORUM=false CATCHUP_COMPLETE=false CATCHUP_RECENT=50000 MAX_CONCURRENT_SUBPROCESSES=10 AUTOMATIC_MAINTENANCE_PERIOD=660 AUTOMATIC_MAINTENANCE_COUNT=700 INVARIANT_CHECKS = [] [[HOME_DOMAINS]] HOME_DOMAIN="stellar.org" QUALITY="HIGH" [[HOME_DOMAINS]] HOME_DOMAIN="satoshipay.io" QUALITY="HIGH" [[HOME_DOMAINS]] HOME_DOMAIN="lobstr.co" QUALITY="HIGH" [[HOME_DOMAINS]] HOME_DOMAIN="www.coinqvest.com" QUALITY="HIGH" [[HOME_DOMAINS]] HOME_DOMAIN="keybase.io" QUALITY="HIGH" [[HOME_DOMAINS]] HOME_DOMAIN="stellar.blockdaemon.com" QUALITY="HIGH" [[HOME_DOMAINS]] HOME_DOMAIN="wirexapp.com" QUALITY="HIGH" [[VALIDATORS]] NAME="sdf_1" HOME_DOMAIN="stellar.org" PUBLIC_KEY="GCGB2S2KGYARPVIA37HYZXVRM2YZUEXA6S33ZU5BUDC6THSB62LZSTYH" ADDRESS="core-live-a.stellar.org:11625" HISTORY="curl -sf https://history.stellar.org/prd/core-live/core_live_001/{0} -o {1}" [[VALIDATORS]] NAME="sdf_2" HOME_DOMAIN="stellar.org" PUBLIC_KEY="GCM6QMP3DLRPTAZW2UZPCPX2LF3SXWXKPMP3GKFZBDSF3QZGV2G5QSTK" ADDRESS="core-live-b.stellar.org:11625" HISTORY="curl -sf https://history.stellar.org/prd/core-live/core_live_002/{0} -o {1}" [[VALIDATORS]] NAME="sdf_3" HOME_DOMAIN="stellar.org" PUBLIC_KEY="GABMKJM6I25XI4K7U6XWMULOUQIQ27BCTMLS6BYYSOWKTBUXVRJSXHYQ" ADDRESS="core-live-c.stellar.org:11625" HISTORY="curl -sf https://history.stellar.org/prd/core-live/core_live_003/{0} -o {1}" [[VALIDATORS]] NAME="satoshipay_singapore" HOME_DOMAIN="satoshipay.io" PUBLIC_KEY="GBJQUIXUO4XSNPAUT6ODLZUJRV2NPXYASKUBY4G5MYP3M47PCVI55MNT" ADDRESS="stellar-sg-sin.satoshipay.io:11625" HISTORY="curl -sf https://stellar-history-sg-sin.satoshipay.io/{0} -o {1}" [[VALIDATORS]] NAME="satoshipay_iowa" HOME_DOMAIN="satoshipay.io" PUBLIC_KEY="GAK6Z5UVGUVSEK6PEOCAYJISTT5EJBB34PN3NOLEQG2SUKXRVV2F6HZY" ADDRESS="stellar-us-iowa.satoshipay.io:11625" HISTORY="curl -sf https://stellar-history-us-iowa.satoshipay.io/{0} -o {1}" [[VALIDATORS]] NAME="satoshipay_frankfurt" HOME_DOMAIN="satoshipay.io" PUBLIC_KEY="GC5SXLNAM3C4NMGK2PXK4R34B5GNZ47FYQ24ZIBFDFOCU6D4KBN4POAE" ADDRESS="stellar-de-fra.satoshipay.io:11625" HISTORY="curl -sf https://stellar-history-de-fra.satoshipay.io/{0} -o {1}" [[VALIDATORS]] NAME="lobstr_1_europe" HOME_DOMAIN="lobstr.co" PUBLIC_KEY="GCFONE23AB7Y6C5YZOMKUKGETPIAJA4QOYLS5VNS4JHBGKRZCPYHDLW7" ADDRESS="v1.stellar.lobstr.co:11625" HISTORY="curl -sf https://stellar-archive-1-lobstr.s3.amazonaws.com/{0} -o {1}" [[VALIDATORS]] NAME="lobstr_2_europe" HOME_DOMAIN="lobstr.co" PUBLIC_KEY="GDXQB3OMMQ6MGG43PWFBZWBFKBBDUZIVSUDAZZTRAWQZKES2CDSE5HKJ" ADDRESS="v2.stellar.lobstr.co:11625" HISTORY="curl -sf https://stellar-archive-2-lobstr.s3.amazonaws.com/{0} -o {1}" [[VALIDATORS]] NAME="lobstr_3_north_america" HOME_DOMAIN="lobstr.co" PUBLIC_KEY="GD5QWEVV4GZZTQP46BRXV5CUMMMLP4JTGFD7FWYJJWRL54CELY6JGQ63" ADDRESS="v3.stellar.lobstr.co:11625" HISTORY="curl -sf https://stellar-archive-3-lobstr.s3.amazonaws.com/{0} -o {1}" [[VALIDATORS]] NAME="lobstr_4_asia" HOME_DOMAIN="lobstr.co" PUBLIC_KEY="GA7TEPCBDQKI7JQLQ34ZURRMK44DVYCIGVXQQWNSWAEQR6KB4FMCBT7J" ADDRESS="v4.stellar.lobstr.co:11625" HISTORY="curl -sf https://stellar-archive-4-lobstr.s3.amazonaws.com/{0} -o {1}" [[VALIDATORS]] NAME="lobstr_5_australia" HOME_DOMAIN="lobstr.co" PUBLIC_KEY="GA5STBMV6QDXFDGD62MEHLLHZTPDI77U3PFOD2SELU5RJDHQWBR5NNK7" ADDRESS="v5.stellar.lobstr.co:11625" HISTORY="curl -sf https://stellar-archive-5-lobstr.s3.amazonaws.com/{0} -o {1}" [[VALIDATORS]] NAME="coinqvest_hong_kong" HOME_DOMAIN="www.coinqvest.com" PUBLIC_KEY="GAZ437J46SCFPZEDLVGDMKZPLFO77XJ4QVAURSJVRZK2T5S7XUFHXI2Z" ADDRESS="hongkong.stellar.coinqvest.com:11625" HISTORY="curl -sf https://hongkong.stellar.coinqvest.com/history/{0} -o {1}" [[VALIDATORS]] NAME="coinqvest_germany" HOME_DOMAIN="www.coinqvest.com" PUBLIC_KEY="GD6SZQV3WEJUH352NTVLKEV2JM2RH266VPEM7EH5QLLI7ZZAALMLNUVN" ADDRESS="germany.stellar.coinqvest.com:11625" HISTORY="curl -sf https://germany.stellar.coinqvest.com/history/{0} -o {1}" [[VALIDATORS]] NAME="coinqvest_finland" HOME_DOMAIN="www.coinqvest.com" PUBLIC_KEY="GADLA6BJK6VK33EM2IDQM37L5KGVCY5MSHSHVJA4SCNGNUIEOTCR6J5T" ADDRESS="finland.stellar.coinqvest.com:11625" HISTORY="curl -sf https://finland.stellar.coinqvest.com/history/{0} -o {1}" [[VALIDATORS]] NAME="keybase_io" HOME_DOMAIN="keybase.io" PUBLIC_KEY="GCWJKM4EGTGJUVSWUJDPCQEOEP5LHSOFKSA4HALBTOO4T4H3HCHOM6UX" ADDRESS="stellar0.keybase.io:11625" HISTORY="curl -sf https://stellarhistory.keybase.io/{0} -o {1}" [[VALIDATORS]] NAME="keybase_1" HOME_DOMAIN="keybase.io" PUBLIC_KEY="GDKWELGJURRKXECG3HHFHXMRX64YWQPUHKCVRESOX3E5PM6DM4YXLZJM" ADDRESS="stellar1.keybase.io:11625" HISTORY="curl -sf https://stellarhistory1.keybase.io/{0} -o {1}" [[VALIDATORS]] NAME="keybase_2" HOME_DOMAIN="keybase.io" PUBLIC_KEY="GA35T3723UP2XJLC2H7MNL6VMKZZIFL2VW7XHMFFJKKIA2FJCYTLKFBW" ADDRESS="stellar2.keybase.io:11625" HISTORY="curl -sf https://stellarhistory2.keybase.io/{0} -o {1}" [[VALIDATORS]] NAME="Blockdaemon_Validator_1" HOME_DOMAIN="stellar.blockdaemon.com" PUBLIC_KEY="GAAV2GCVFLNN522ORUYFV33E76VPC22E72S75AQ6MBR5V45Z5DWVPWEU" ADDRESS="stellar-full-validator1.bdnodes.net" HISTORY="curl -sf https://stellar-full-history1.bdnodes.net/{0} -o {1}" [[VALIDATORS]] NAME="Blockdaemon_Validator_2" HOME_DOMAIN="stellar.blockdaemon.com" PUBLIC_KEY="GAVXB7SBJRYHSG6KSQHY74N7JAFRL4PFVZCNWW2ARI6ZEKNBJSMSKW7C" ADDRESS="stellar-full-validator2.bdnodes.net" HISTORY="curl -sf https://stellar-full-history2.bdnodes.net/{0} -o {1}" [[VALIDATORS]] NAME="Blockdaemon_Validator_3" HOME_DOMAIN="stellar.blockdaemon.com" PUBLIC_KEY="GAYXZ4PZ7P6QOX7EBHPIZXNWY4KCOBYWJCA4WKWRKC7XIUS3UJPT6EZ4" ADDRESS="stellar-full-validator3.bdnodes.net" HISTORY="curl -sf https://stellar-full-history3.bdnodes.net/{0} -o {1}" [[VALIDATORS]] NAME="wirexUS" ADDRESS="us.stellar.wirexapp.com" HOME_DOMAIN="wirexapp.com" PUBLIC_KEY="GDXUKFGG76WJC7ACEH3JUPLKM5N5S76QSMNDBONREUXPCZYVPOLFWXUS" HISTORY="curl -sf http://wxhorizonusstga1.blob.core.windows.net/history/{0} -o {1}" [[VALIDATORS]] NAME="wirexUK" ADDRESS="uk.stellar.wirexapp.com" HOME_DOMAIN="wirexapp.com" PUBLIC_KEY="GBBQQT3EIUSXRJC6TGUCGVA3FVPXVZLGG3OJYACWBEWYBHU46WJLWXEU" HISTORY="curl -sf http://wxhorizonukstga1.blob.core.windows.net/history/{0} -o {1}" [[VALIDATORS]] NAME="wirexSG" ADDRESS="sg.stellar.wirexapp.com" HOME_DOMAIN="wirexapp.com" PUBLIC_KEY="GAB3GZIE6XAYWXGZUDM4GMFFLJBFMLE2JDPUCWUZXMOMT3NHXDHEWXAS" HISTORY="curl -sf http://wxhorizonasiastga1.blob.core.windows.net/history/{0} -o {1}" ```

Maybe you need to try to clean up old archives or there is an issue with a gap in history so your stellar-core no longer can catch up or it's missing some of the archives, try experimenting with CATCHUP_RECENT value in stellar-core, HISTORY_RETENTION_COUNT value in horizon and try adding more validators with archive entries and/or opening your public stellar port.

astudnev commented 3 years ago

After fixing config file and again starting from empty database for core, i getting the following logs:

Apr 07 16:34:11 node1-1 stellar-core[16776]: 2021-04-07T16:34:11.751 GBAJJ [History INFO] Catching up to ledger 34813951: Succeeded: batch-download-ledger-0212e8bf-0213383f : 636/636 children completed
Apr 07 16:34:11 node1-1 stellar-core[16776]: 2021-04-07T16:34:11.751 GBAJJ [History INFO] Verifying ledgers [34793622,34813952)
Apr 07 16:34:11 node1-1 stellar-core[16776]: 2021-04-07T16:34:11.751 GBAJJ [History INFO] Verifying ledger [seq=34813951, hash=0b2cf5] against SCP hash
Apr 07 16:34:11 node1-1 stellar-core[16776]: 2021-04-07T16:34:11.751 GBAJJ [History INFO] Catching up to ledger 34813951: verifying checkpoint 1/318 (0%)
Apr 07 16:34:11 node1-1 stellar-core[16776]: 2021-04-07T16:34:11.826 GBAJJ [History ERROR] Bad ledger-header history entry: claimed ledger [seq=34793621, hash=d05842] does not agree with LCL [seq=34793621, hash=599be9]
Apr 07 16:34:11 node1-1 stellar-core[16776]: 2021-04-07T16:34:11.826 GBAJJ [History ERROR] Catchup material failed verification - hash mismatch, propagating failure
Apr 07 16:34:11 node1-1 stellar-core[16776]: 2021-04-07T16:34:11.826 GBAJJ [History ERROR] One or more of history archives may be corrupted. Update HISTORY configuration entry to only contain valid ones
Apr 07 16:34:11 node1-1 stellar-core[16776]: 2021-04-07T16:34:11.826 GBAJJ [History INFO] Verifying ledgers [34793622,34813952)
Apr 07 16:34:11 node1-1 stellar-core[16776]: 2021-04-07T16:34:11.826 GBAJJ [History INFO] Catching up to ledger 34813951: Failed: verify-ledger-chain
Apr 07 16:34:11 node1-1 stellar-core[16776]: 2021-04-07T16:34:11.826 GBAJJ [History INFO] Catching up to ledger 34813951: Succeeded: batch-download-ledger-0212e8bf-0213383f : 636/636 children completed

is it normal?

MonsieurNicolas commented 3 years ago

Thank you.

We are currently working on a full resolution - I am going to leave this issue open for now in case other people are still stuck.

We had a corruption issue that caused some nodes on the network to get stuck.

We just put together a doc to help people get unstuck. Take a look: https://docs.google.com/document/d/1tgvcMHblTCKvJb3iOu8NZIMPVpBHUBcXtDwuhAOXvh0/edit?usp=sharing

astudnev commented 3 years ago

Thank you.

We are currently working on a full resolution - I am going to leave this issue open for now in case other people are still stuck.

We had a corruption issue that caused some nodes on the network to get stuck.

We just put together a doc to help people get unstuck. Take a look: https://docs.google.com/document/d/1tgvcMHblTCKvJb3iOu8NZIMPVpBHUBcXtDwuhAOXvh0/edit?usp=sharing

I stuck on the step 2.2.5 with the following error:

stellar-horizon-cmd db reingest range  --force 34793600 34793700
INFO[2021-04-08T09:30:43.699Z] Ingestion system initial state                current_state="reingestHistoryRange(fromLedger=34793600, toLedger=34793700, force=true)" pid=29618 service=ingest
INFO[2021-04-08T09:30:43.699Z] Preparing ledger backend to retrieve range    from=34793600 pid=29618 service=ingest to=34793700
ERRO[2021-04-08T09:30:43.699Z] Error in ingestion state machine              current_state="reingestHistoryRange(fromLedger=34793600, toLedger=34793700, force=true)" error="error preparing range: `from` ledger does not exist" next_state=stop pid=29618 service=ingest
INFO[2021-04-08T09:30:43.700Z] Shut down                                     pid=29618 service=ingest
2021/04/08 09:30:43 error preparing range: `from` ledger does not exist

however i have core CATCHUP_RECENT=80480

and node looks as synced:

2021-04-08T09:29:03.038 GDCTQ [default INFO] {
   "info" : {
      "build" : "stellar-core 15.4.0 (ff56e7f207b918aae53710abcc001511ead7ece9)",
      "history_failure_rate" : "0.0",
      "ledger" : {
         "age" : 1,
         "baseFee" : 100,
         "baseReserve" : 5000000,
         "closeTime" : 1617874142,
         "hash" : "d675a384a6d7a2fd2998ee88a56c68e1ce27e8ac72e821afc3758a026c6fc6d3",
         "maxTxSetSize" : 1000,
         "num" : 34825285,
         "version" : 15
      },
      "network" : "Public Global Stellar Network ; September 2015",
      "peers" : {
         "authenticated_count" : 8,
         "pending_count" : 2
      },
      "protocol_version" : 15,
      "quorum" : {
         "node" : "GDCOY",
         "qset" : {
            "agree" : 12,
            "cost" : 71165212,
            "delayed" : 0,
            "disagree" : 0,
            "fail_at" : 4,
            "hash" : "cb4a38",
            "lag_ms" : 432,
            "ledger" : 34825284,
            "missing" : 0,
            "phase" : "EXTERNALIZE"
         },
         "transitive" : {
            "critical" : null,
            "intersection" : true,
            "last_check_ledger" : 34823900,
            "node_count" : 24
         }
      },
      "startedOn" : "2021-04-08T07:23:16Z",
      "state" : "Synced!"
   }
}

in postgres:

stellar=> select min(ledgerseq),max(ledgerseq) FROM ledgerheaders LIMIT 1;
   min    |   max
----------+----------
 34795065 | 34825372
(1 row)

please help what to do?

bert2002 commented 3 years ago

I am following the guide from https://docs.google.com/document/d/1U0xH3U4KiKRWUyXgu4MZLBchw88-a5ExCcSy33b3gJU/edit and it seems till 1.3.3 it is working. The node seems to be in sync again, but I still cant get transaction details for 34793621 through horizon. I tried running the reingest, but it fails:

stellar@stellar-1:/etc/stellar$ export HISTORY_ARCHIVE_URLS="https://history.stellar.org/prd/core-live/core_live_001/" ; export DATABASE_URL="dbname=horizon user=stellar host=/var/run/postgresql" ; export NETWORK_PASSPHRASE="Public Global Stellar Network ; September 2015" ; stellar-horizon db reingest range --force 34793621 34838538
INFO[2021-04-09T05:38:48.565Z] Ingestion system initial state                current_state="reingestHistoryRange(fromLedger=34793621, toLedger=34838538, force=true)" pid=19824 service=ingest
INFO[2021-04-09T05:38:48.565Z] Preparing ledger backend to retrieve range    from=34793621 pid=19824 service=ingest to=34838538
ERRO[2021-04-09T05:38:50.262Z] Error in ingestion state machine              current_state="reingestHistoryRange(fromLedger=34793621, toLedger=34838538, force=true)" error="error preparing range: error starting prepare range: opening subprocess: error running stellar-core: error waiting for `stellar-core new-db` subprocess: could not start `stellar-core [new-db]` cmd: fork/exec : no such file or directory" next_state=stop pid=19824 service=ingest
INFO[2021-04-09T05:38:50.262Z] Shut down                                     pid=19824 service=ingest
2021/04/09 05:38:50 error preparing range: error starting prepare range: opening subprocess: error running stellar-core: error waiting for `stellar-core new-db` subprocess: could not start `stellar-core [new-db]` cmd: fork/exec : no such file or directory

I am surprised to see that it even wants to start stellar-core new-db.

tamirms commented 3 years ago

@bert2002 are you using horizon with captive core? if so can you share your captive core configuration?

bert2002 commented 3 years ago

@tamirms I dont use horizon with the captive core

bartekn commented 3 years ago

@astudnev your core DB oldest ledger is 34795065 but you try to reingest from 34793600 (earlier). Also step 2.2.5 (and point 2 in general) does not include reingestion intructions.

@bert2002 I think that you are using Captive Core but in this case you don't need to do 1.3.1-1.3.3. If you upgraded from Horizon before 2.0.0 you can disable Captive Core by setting ENABLE_CAPTIVE_CORE_INGESTION="false".

astudnev commented 3 years ago

@astudnev your core DB oldest ledger is 34795065 but you try to reingest from 34793600 (earlier). Also step 2.2.5 (and point 2 in general) does not include reingestion intructions

Why my first block moved then ( it was initially earlier!) ? It was configured to be at minimum CATCHUP_RECENT=80480 blocks to be included. This one is much less than 80480 before recent... i misunderstand something? I just followed the instructions from the doc

Instructions do not specify which value for CATCHUP_RECENT to set

bert2002 commented 3 years ago

@bert2002 I think that you are using Captive Core but in this case you don't need to do 1.3.1-1.3.3. If you upgraded from Horizon before 2.0.0 you can disable Captive Core by setting ENABLE_CAPTIVE_CORE_INGESTION="false". I had configured this before:
ENABLE_CAPTIVE_CORE_INGESTION=false
and tried yours:
ENABLE_CAPTIVE_CORE_INGESTION="false"
but it has the same result.

The core status is actually:

$ stellar-core --conf /etc/stellar/stellar-core.cfg http-command 'info'
2021-04-09T10:51:09.334 [default INFO] Config from /etc/stellar/stellar-core.cfg
2021-04-09T10:51:09.341 [default INFO] Generated QUORUM_SET: {
   "t" : 5,
   "v" : [
      {
         "t" : 2,
         "v" : [
            "Blockdaemon_Validator_1",
            "Blockdaemon_Validator_2",
            "Blockdaemon_Validator_3"
         ]
      },
      {
         "t" : 2,
         "v" : [ "sdf_3", "sdf_1", "sdf_2" ]
      },
      {
         "t" : 2,
         "v" : [ "wirexSG", "wirexUK", "wirexUS" ]
      },
      {
         "t" : 2,
         "v" : [ "COINQVEST_Finland", "COINQVEST_Hong_Kong", "COINQVEST_Germany" ]
      },
      {
         "t" : 2,
         "v" : [
            "SatoshiPay_US_Iowa",
            "SatoshiPay_SG_Singapore",
            "SatoshiPay_DE_Frankfurt"
         ]
      },
      {
         "t" : 2,
         "v" : [ "keybase2", "keybase.io", "keybase1" ]
      },
      {
         "t" : 3,
         "v" : [
            "LOBSTR_5_Australia",
            "LOBSTR_4_Asia",
            "LOBSTR_1_Europe",
            "LOBSTR_2_Europe_",
            "LOBSTR_3_North_America"
         ]
      }
   ]
}

2021-04-09T10:51:09.341 [default INFO] Assigning calculated value of 2 to FAILURE_SAFETY
Content-Length: 1429
Content-Type: application/json

2021-04-09T10:51:09.768 GBSYZ [default INFO] {
   "info" : {
      "build" : "stellar-core 15.5.0 (a53cc5976371249d6f429861f9a4f4791554d30d)",
      "history_failure_rate" : "0.0014",
      "ledger" : {
         "age" : 13614,
         "baseFee" : 100,
         "baseReserve" : 5000000,
         "closeTime" : 1617951855,
         "hash" : "7d3051e5f0c134689ab95dc6b96cf94d7cfe7787c03fa4b30f2d4866c9f458ab",
         "maxTxSetSize" : 1000,
         "num" : 34839725,
         "version" : 15
      },
      "network" : "Public Global Stellar Network ; September 2015",
      "peers" : {
         "authenticated_count" : 9,
         "pending_count" : 0
      },
      "protocol_version" : 15,
      "quorum" : {
         "node" : "GC3HB",
         "qset" : {
            "agree" : 23,
            "cost" : 10603496,
            "delayed" : 0,
            "disagree" : 0,
            "fail_at" : 6,
            "hash" : "341a0a",
            "lag_ms" : 129,
            "ledger" : 34842259,
            "missing" : 0,
            "phase" : "EXTERNALIZE"
         },
         "transitive" : {
            "critical" : null,
            "intersection" : true,
            "last_check_ledger" : 34836694,
            "node_count" : 24
         }
      },
      "startedOn" : "2021-04-09T02:32:22Z",
      "state" : "Catching up",
      "status" : [
         "Catching up to ledger 34842175: Download & apply checkpoints: num checkpoints left to apply:39 (7% done)"
      ]
   }
}

and horizon:

$ curl localhost:8000
{
...
  "horizon_version": "2.1.0-58dc3a7339ad1055a1e8eb13e9f0082a98383457",
  "core_version": "stellar-core 15.5.0 (a53cc5976371249d6f429861f9a4f4791554d30d)",
  "ingest_latest_ledger": 34793621,
  "history_latest_ledger": 34793621,
  "history_latest_ledger_closed_at": "2021-04-06T08:18:53Z",
  "history_elder_ledger": 32075200,
  "core_latest_ledger": 34839428,
  "network_passphrase": "Public Global Stellar Network ; September 2015",
  "current_protocol_version": 15,
  "core_supported_protocol_version": 15
}

gituser commented 3 years ago

@bert2002 it seems the stellar protocol has been upgraded to 16, so you need to update stellar-core to v16.0.0 in order to sync with the network.

bert2002 commented 3 years ago

@gituser I updated and it started syncing, but now we are stuck with this:

Apr 12 21:54:58 stellar-1 stellar-core[30430]: 2021-04-12T21:54:58.852 GCDMK [Herder INFO] Asking peers for SCP messages more recent than 34884628
Apr 12 21:54:58 stellar-1 stellar-core[30430]: 2021-04-12T21:54:58.887 GCDMK [default INFO] Performing maintenance
Apr 12 21:54:58 stellar-1 stellar-core[30430]: 2021-04-12T21:54:58.923 GCDMK [History INFO] Trimming history <= ledger 34793621 (rmin=34793621, qmin=34843053, lmin=34842989)
Apr 12 22:22:48 stellar-1 stellar-core[30430]: 2021-04-12T22:22:48.099 GCDMK [Tx INFO] applying ledger 34843054 (txs:132, ops:315)
Apr 12 23:55:52 stellar-1 stellar-core[30430]: 2021-04-12T23:55:52.874 GCDMK [Herder INFO] Asking peers for SCP messages more recent than 34884628
Apr 12 23:55:52 stellar-1 stellar-core[30430]: 2021-04-12T23:55:52.912 GCDMK [default INFO] Performing maintenance
Apr 12 23:55:52 stellar-1 stellar-core[30430]: 2021-04-12T23:55:52.947 GCDMK [History INFO] Trimming history <= ledger 34793621 (rmin=34793621, qmin=34843054, lmin=34842990)
Apr 12 23:55:53 stellar-1 stellar-core[30430]: 2021-04-12T23:55:53.733 GCDMK [Perf WARNING] Dropped 36 slow-execution warning messages
Apr 13 00:33:11 stellar-1 stellar-core[30430]: 2021-04-13T00:33:11.709 GCDMK [Tx INFO] applying ledger 34843055 (txs:146, ops:297)
Apr 13 01:44:15 stellar-1 stellar-core[30430]: 2021-04-13T01:44:14.997 GCDMK [Herder INFO] Asking peers for SCP messages more recent than 34884628
Apr 13 01:44:15 stellar-1 stellar-core[30430]: 2021-04-13T01:44:15.017 GCDMK [default INFO] Performing maintenance
Apr 13 01:44:15 stellar-1 stellar-core[30430]: 2021-04-13T01:44:15.039 GCDMK [History INFO] Trimming history <= ledger 34793621 (rmin=34793621, qmin=34843055, lmin=34842991)
Apr 13 02:18:45 stellar-1 stellar-core[30430]: 2021-04-13T02:18:45.297 GCDMK [Tx INFO] applying ledger 34843056 (txs:135, ops:257)
Apr 13 03:24:54 stellar-1 stellar-core[30430]: 2021-04-13T03:24:54.691 GCDMK [Herder INFO] Asking peers for SCP messages more recent than 34884628
Apr 13 03:24:54 stellar-1 stellar-core[30430]: 2021-04-13T03:24:54.725 GCDMK [default INFO] Performing maintenance
Apr 13 03:24:54 stellar-1 stellar-core[30430]: 2021-04-13T03:24:54.762 GCDMK [History INFO] Trimming history <= ledger 34793621 (rmin=34793621, qmin=34843056, lmin=34842992)

PostgreSQL is on fire :fire:

MonsieurNicolas commented 3 years ago

with the fixes in v15.5.0, v16.0.0 and the network upgrade to protocol 16 this is now resolved