helius-labs / atlas-txn-sender

Apache License 2.0
139 stars 40 forks source link

Atlas killing RPC #16

Open AlexRubik opened 7 months ago

AlexRubik commented 7 months ago

After some amount of hours, my RPC dies with a "No available UDP ports in (8000, 10000)" error. On latest versions of everything I use: Yellowstone GRPC, Jupiter v6 API, Jito Validator Fork. I have no firewall rules. According to this error, ports are being opened but not being closed? Maybe the quic client needs to time out faster?

NUM_LEADERS=4 TPU_CONNECTION_POOL_SIZE=4

[2024-04-06T03:19:05.991578373Z INFO  solana_metrics::metrics] datapoint: loaded-programs-cache-stats slot=258599051i hits=10674i misses=0i evictions=0i reloads=0i insertions=0i lost_insertions=0i replace_entry=0i one_hit_wonders=0i prunes_orphan=0i prunes_environment=0i empty_entries=0i
[2024-04-06T03:19:05.995420336Z INFO  solana_quic_client::quic_client] Timedout sending data 141.98.216.132:8016
   0: rust_begin_unwind
             at /rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library/std/src/panicking.rs:595:5
   1: core::panicking::panic_fmt
             at /rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library/core/src/panicking.rs:67:14
   2: core::result::unwrap_failed
             at /rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library/core/src/result.rs:1652:5
   3: solana_quic_client::nonblocking::quic_client::QuicLazyInitializedEndpoint::create_endpoint
   4: <futures_util::future::future::map::Map<Fut,F> as core::future::future::Future>::poll
   5: <futures_util::future::future::map::Map<Fut,F> as core::future::future::Future>::poll
   6: <solana_quic_client::nonblocking::quic_client::QuicClientConnection as solana_connection_cache::nonblocking::client_connection::ClientConnection>::send_data::{{closure}}
   7: tokio::runtime::park::CachedParkThread::block_on
   8: tokio::runtime::context::runtime::enter_runtime
   9: tokio::runtime::runtime::Runtime::block_on
  10: <solana_quic_client::quic_client::QuicClientConnection as solana_connection_cache::client_connection::ClientConnection>::send_data
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
[2024-04-06T03:19:06.008977821Z ERROR solana_metrics::metrics] datapoint: panic program="validator" thread="solWarmQuicSvc" one=1i message="panicked at quic-client/src/nonblocking/quic_client.rs:111:14:
    QuicLazyInitializedEndpoint::create_endpoint bind_in_range: Custom { kind: Other, error: \"No available UDP ports in (8000, 10000)\" }" location="quic-client/src/nonblocking/quic_client.rs:111:14" version="1.17.28 (src:286fd575; feat:3746964731, client:JitoLabs)"
[2024-04-06T03:19:06.014510904Z INFO  solana_quic_client::quic_client] Timedout sending data 141.98.216.132:8014
[2024-04-06T03:19:06.017823392Z INFO  solana_quic_client::quic_client] Timedout sending data 141.98.216.132:8016
[2024-04-06T03:19:06.017834193Z INFO  solana_quic_client::quic_client] Timedout sending data 74.118.143.73:11228
[2024-04-06T03:19:06.017825907Z INFO  solana_quic_client::quic_client] Timedout sending data 74.118.143.73:11228
[2024-04-06T03:19:06.017859140Z INFO  solana_quic_client::quic_client] Timedout sending data 141.98.216.132:8009
[2024-04-06T03:19:06.017863438Z INFO  solana_quic_client::quic_client] Timedout sending data 141.98.216.132:8014
[2024-04-06T03:19:06.017910638Z INFO  solana_quic_client::quic_client] Timedout sending data 141.98.216.132:8009

Validator startup args:

solana-validator --expected-genesis-hash 5eykt4UsFv8P8NJdTREpY1vzqKqZKvdpKuc147dw2N9d --entrypoint 'entrypoint2.mainnet-beta.solana.com:8001' --entrypoint 'entrypoint3.mainnet-beta.solana.com:8001' --entrypoint 'entrypoint.mainnet-beta.solana.com:8001' --entrypoint 'entrypoint4.mainnet-beta.solana.com:8001' --entrypoint 'entrypoint5.mainnet-beta.solana.com:8001' --no-voting --ledger /mnt/ledger --accounts /mnt/accounts --rpc-port 8899 --identity /home/ubuntu/validator-keypair.json --log /home/ubuntu/solana-validator.log --maximum-local-snapshot-age 3000 --wal-recovery-mode skip_any_corrupted_record --full-rpc-api --block-engine-url 'https://amsterdam.mainnet.block-engine.jito.wtf' --allow-private-addr --minimal-snapshot-download-speed 95985760 --tip-payment-program-pubkey T1pyyaTNZsKv2WcRAB8oVnk93mLJw2XzjtVYqCsaHqt --tip-distribution-program-pubkey 4R3gSG8BpU4t19KYj8CfnbtRpnT8gtk4dvTHxVRwc2r7 --limit-ledger-size 55000000 --geyser-plugin-config "/home/ubuntu/yellowstone-grpc/yellowstone-grpc-geyser/config.json" --account-index program-id --account-index-include-key TokenkegQfeZyiNwAJbNbGKPFXCWuBvf9Ss623VQ5DA --account-index-include-key 3tZPEagumHvtgBhivFJCmhV9AyhBHGW9VgdsK52i4gwP --account-index-include-key AddressLookupTab1e1111111111111111111111111 --accounts-db-cache-limit-mb 150000 --accounts-index-memory-limit-mb 128000 --private-rpc --rpc-send-retry-ms 5

vovkman commented 7 months ago

I wouldn't recommend running this on the same server as your RPC. I have never seen this error myself, but also haven't tried running atlas on the same server as RPC

jesuspc commented 7 months ago

Hey! I've had the same problem recently a couple of times after I started using the atlas-txn-sender, also running the process in the same machine as the RPC. Interestingly I receive a notifications from my cloud provider about protection from a DDOS event at the same times.

Solarb80 commented 1 month ago

I'm seeing the same problem. If I use systemd to prevent atlas-txn-sender from binding ports in the validator's range I can it attempts to bind the ports at solana-quic-client-1.17.28/src/nonblocking/quic_client.rs. Unfortunately the port range the quic client uses is hardcoded to 8000-10000. It's not clear to me why the connection cache seems to be holding more and more ports over time.