stellar / go

Stellar's public monorepo of go code
https://stellar.org/developers
Apache License 2.0
1.29k stars 497 forks source link

Investigate reaping sql timeouts in horizon #5320

Open aditya1702 opened 2 months ago

aditya1702 commented 2 months ago

What version are you using?

Horizon: 2.30.0 Go: 1.22

What did you do?

What did you expect to see?

If history-retention is set, reaping should succeed without any issues:

time="2024-05-16T19:45:41.770Z" level=info msg="reaper succeeded" new_elder=51525683 pid=633297

What did you see instead?

We have notice reports of reaping failing due to sql timeout errors:

time="2024-05-09T18:50:31.614+08:00" level=warning msg="Error reaping lookup tables" err="error running query: delete from history_assets where id IN (select id from (select id, (select 1 from history_trades where base_asset_id = hcb.id limit 1) as c0, (select 1 from history_trades where counter_asset_id = hcb.id limit 1) as c1, (select 1 from history_trades_60000 where base_asset_id = hcb.id limit 1) as c2, (select 1 from history_trades_60000 where counter_asset_id = hcb.id limit 1) as c3, 1 as cx from history_assets hcb where id >= 1001 order by id limit 1000) as sub where c0 IS NULL and c1 IS NULL and c2 IS NULL and c3 IS NULL and 1=1);: canceling statement due to lack of response within timeout period" pid=261239 service=ingest
airstring commented 2 months ago

stellar-core, stellar-horizon, and postgres share the same host, which is 8c16GB , and a ssd with a maximum of 5000 iops.

# curl http://localhost:8000
"core_version": "stellar-core 20.4.0 (7fc7671b8bc1ccc3b1f16a6ab83bc9f671db8b70)",
  "ingest_latest_ledger": 51707393,
  "history_latest_ledger": 51707393,
  "history_latest_ledger_closed_at": "2024-05-17T06:44:26Z",
  "history_elder_ledger": 51655384,
# grep "reap" stellar-horizon.log
May 17 07:47:22 ip-172-24-103-136 stellar-horizon[747541]: time="2024-05-17T07:47:22.126+08:00" level=warning msg="Error reaping lookup tables" err="error running query: delete from history_accounts where id IN (select id from (select id, (select 1 from history_effects where history_account_id = hcb.id limit 1) as c0, (select 1 from history_operation_participants where history_account_id = hcb.id limit 1) as c1, (select 1 from history_trades where base_account_id = hcb.id limit 1) as c2, (select 1 from history_trades where counter_account_id = hcb.id limit 1) as c3, (select 1 from history_transaction_participants where history_account_id = hcb.id limit 1) as c4, 1 as cx from history_accounts hcb where id >= 0 order by id limit 1000) as sub where c0 IS NULL and c1 IS NULL and c2 IS NULL and c3 IS NULL and c4 IS NULL and 1=1);: canceling statement due to lack of response within timeout period" pid=747541 service=ingest
May 17 08:25:26 ip-172-24-103-136 stellar-horizon[747541]: time="2024-05-17T08:25:26.291+08:00" level=info msg="reaper: clearing" end_ledger=51651784 pid=747541 start_ledger=51651179
May 17 08:25:33 ip-172-24-103-136 stellar-horizon[747541]: time="2024-05-17T08:25:33.840+08:00" level=info msg="reaper succeeded" new_elder=51651785 pid=747541
May 17 09:25:33 ip-172-24-103-136 stellar-horizon[747541]: time="2024-05-17T09:25:33.842+08:00" level=info msg="reaper: clearing" end_ledger=51652383 pid=747541 start_ledger=51651785
May 17 09:25:46 ip-172-24-103-136 stellar-horizon[747541]: time="2024-05-17T09:25:46.519+08:00" level=info msg="reaper succeeded" new_elder=51652384 pid=747541
May 17 10:25:46 ip-172-24-103-136 stellar-horizon[747541]: time="2024-05-17T10:25:46.519+08:00" level=info msg="reaper: clearing" end_ledger=51652985 pid=747541 start_ledger=51652384
May 17 10:26:06 ip-172-24-103-136 stellar-horizon[747541]: time="2024-05-17T10:26:06.641+08:00" level=info msg="reaper succeeded" new_elder=51652986 pid=747541
May 17 10:52:12 ip-172-24-103-136 stellar-horizon[747541]: time="2024-05-17T10:52:12.671+08:00" level=warning msg="Error reaping lookup tables" err="error running query: delete from history_accounts where id IN (select id from (select id, (select 1 from history_effects where history_account_id = hcb.id limit 1) as c0, (select 1 from history_operation_participants where history_account_id = hcb.id limit 1) as c1, (select 1 from history_trades where base_account_id = hcb.id limit 1) as c2, (select 1 from history_trades where counter_account_id = hcb.id limit 1) as c3, (select 1 from history_transaction_participants where history_account_id = hcb.id limit 1) as c4, 1 as cx from history_accounts hcb where id >= 0 order by id limit 1000) as sub where c0 IS NULL and c1 IS NULL and c2 IS NULL and c3 IS NULL and c4 IS NULL and 1=1);: canceling statement due to lack of response within timeout period" pid=747541 service=ingest
May 17 11:26:06 ip-172-24-103-136 stellar-horizon[747541]: time="2024-05-17T11:26:06.641+08:00" level=info msg="reaper: clearing" end_ledger=51653587 pid=747541 start_ledger=51652986
May 17 11:26:31 ip-172-24-103-136 stellar-horizon[747541]: time="2024-05-17T11:26:31.489+08:00" level=info msg="reaper succeeded" new_elder=51653588 pid=747541
May 17 11:53:46 ip-172-24-103-136 stellar-horizon[747541]: time="2024-05-17T11:53:46.428+08:00" level=warning msg="Error reaping lookup tables" err="error running query: delete from history_accounts where id IN (select id from (select id, (select 1 from history_effects where history_account_id = hcb.id limit 1) as c0, (select 1 from history_operation_participants where history_account_id = hcb.id limit 1) as c1, (select 1 from history_trades where base_account_id = hcb.id limit 1) as c2, (select 1 from history_trades where counter_account_id = hcb.id limit 1) as c3, (select 1 from history_transaction_participants where history_account_id = hcb.id limit 1) as c4, 1 as cx from history_accounts hcb where id >= 0 order by id limit 1000) as sub where c0 IS NULL and c1 IS NULL and c2 IS NULL and c3 IS NULL and c4 IS NULL and 1=1);: canceling statement due to lack of response within timeout period" pid=747541 service=ingest
May 17 12:26:31 ip-172-24-103-136 stellar-horizon[747541]: time="2024-05-17T12:26:31.491+08:00" level=info msg="reaper: clearing" end_ledger=51654186 pid=747541 start_ledger=51653588
May 17 12:26:59 ip-172-24-103-136 stellar-horizon[747541]: time="2024-05-17T12:26:59.427+08:00" level=info msg="reaper succeeded" new_elder=51654187 pid=747541
May 17 12:56:00 ip-172-24-103-136 stellar-horizon[747541]: time="2024-05-17T12:56:00.591+08:00" level=warning msg="Error reaping lookup tables" err="error running query: delete from history_accounts where id IN (select id from (select id, (select 1 from history_effects where history_account_id = hcb.id limit 1) as c0, (select 1 from history_operation_participants where history_account_id = hcb.id limit 1) as c1, (select 1 from history_trades where base_account_id = hcb.id limit 1) as c2, (select 1 from history_trades where counter_account_id = hcb.id limit 1) as c3, (select 1 from history_transaction_participants where history_account_id = hcb.id limit 1) as c4, 1 as cx from history_accounts hcb where id >= 0 order by id limit 1000) as sub where c0 IS NULL and c1 IS NULL and c2 IS NULL and c3 IS NULL and c4 IS NULL and 1=1);: canceling statement due to lack of response within timeout period" pid=747541 service=ingest
May 17 13:26:59 ip-172-24-103-136 stellar-horizon[747541]: time="2024-05-17T13:26:59.430+08:00" level=info msg="reaper: clearing" end_ledger=51654786 pid=747541 start_ledger=51654187
May 17 13:27:07 ip-172-24-103-136 stellar-horizon[747541]: time="2024-05-17T13:27:07.481+08:00" level=info msg="reaper succeeded" new_elder=51654787 pid=747541
May 17 13:57:21 ip-172-24-103-136 stellar-horizon[747541]: time="2024-05-17T13:57:21.901+08:00" level=warning msg="Error reaping lookup tables" err="error running query: delete from history_accounts where id IN (select id from (select id, (select 1 from history_effects where history_account_id = hcb.id limit 1) as c0, (select 1 from history_operation_participants where history_account_id = hcb.id limit 1) as c1, (select 1 from history_trades where base_account_id = hcb.id limit 1) as c2, (select 1 from history_trades where counter_account_id = hcb.id limit 1) as c3, (select 1 from history_transaction_participants where history_account_id = hcb.id limit 1) as c4, 1 as cx from history_accounts hcb where id >= 0 order by id limit 1000) as sub where c0 IS NULL and c1 IS NULL and c2 IS NULL and c3 IS NULL and c4 IS NULL and 1=1);: canceling statement due to lack of response within timeout period" pid=747541 service=ingest
May 17 14:27:07 ip-172-24-103-136 stellar-horizon[747541]: time="2024-05-17T14:27:07.481+08:00" level=info msg="reaper: clearing" end_ledger=51655383 pid=747541 start_ledger=51654787
May 17 14:27:26 ip-172-24-103-136 stellar-horizon[747541]: time="2024-05-17T14:27:26.726+08:00" level=info msg="reaper succeeded" new_elder=51655384 pid=747541