romanz / electrs

An efficient re-implementation of Electrum Server in Rust
MIT License
1.06k stars 391 forks source link

Bug: cannot finish DB compaction on macOS due to open file limit #918

Open torkelrogstad opened 1 year ago

torkelrogstad commented 1 year ago

Describe the bug Block compaction panics due to too many files open.

Electrs version 0.10.0, built from master.

To Reproduce Steps to reproduce the behavior:

  1. Build from master a1460ec4a90e45c863c29f66932dd91a0aea9f6f with cargo build --locked --release
  2. Run with ./target/release/electrs --log-filters=DEBUG --skip-block-download-wait
  3. Wait until block compaction is reached
  4. Error: thread 'main' panicked at 'DB::put failed: Error { message: "IO error: While open a file for random read: ./db/bitcoin/000213.sst: Too many open files" }', src/db.rs:342:14

Expected behavior I'd expect block compaction to finish

Configuration I don't use a configuration file for this.

Environment variables: ELECTRS_X=Y;...: none Arguments: --foo: --log-filters=DEBUG --skip-block-download-wait

System running electrs

Additional context I tried building this with Prometheus support in order to inspect the file descriptor progress, but got this result:

❯ cargo build --locked --release --all-features
   Compiling electrs v0.10.0 (/Users/torkel/dev/rust/electrs)
error[E0432]: unresolved import `prometheus::process_collector`
 --> src/metrics.rs:6:21
  |
6 |     use prometheus::process_collector::ProcessCollector;
  |                     ^^^^^^^^^^^^^^^^^ could not find `process_collector` in `prometheus`

For more information about this error, try `rustc --explain E0432`.
error: could not compile `electrs` due to previous error

ulimit -n reports 256, but I'm not sure if this is the correct number for max open files.

Posting logs.

``` ❯ ./target/release/electrs --log-filters=DEBUG --skip-block-download-wait Starting electrs 0.10.0 on aarch64 macos with Config { network: Bitcoin, db_path: "./db/bitcoin", daemon_dir: "/Users/torkel/.bitcoin", daemon_auth: CookieFile("/Users/torkel/.bitcoin/.cookie"), daemon_rpc_addr: 127.0.0.1:8332, daemon_p2p_addr: 127.0.0.1:8333, electrum_rpc_addr: 127.0.0.1:50001, monitoring_addr: 127.0.0.1:4224, wait_duration: 10s, jsonrpc_timeout: 15s, index_batch_size: 10, index_lookup_limit: None, reindex_last_blocks: 0, auto_reindex: true, ignore_mempool: false, sync_once: false, skip_block_download_wait: true, disable_electrum_rpc: false, server_banner: "Welcome to electrs 0.10.0 (Electrum Rust Server)!", signet_magic: f9beb4d9, args: [] } [2023-08-15T13:22:58.544Z DEBUG tiny_http] Server listening on 127.0.0.1:4224 [2023-08-15T13:22:58.544Z INFO electrs::metrics::metrics_impl] serving Prometheus metrics on 127.0.0.1:4224 [2023-08-15T13:22:58.544Z INFO electrs::server] serving Electrum RPC on 127.0.0.1:50001 [2023-08-15T13:22:58.544Z DEBUG tiny_http] Running accept thread [2023-08-15T13:22:58.652Z INFO electrs::db] "./db/bitcoin": 652 SST files, 59.91444729 GB, 5.297402539 Grows [2023-08-15T13:22:58.652Z DEBUG electrs::db] DB Some(Config { compacted: false, format: 0 }) [2023-08-15T13:22:59.489Z INFO electrs::chain] loading 803298 headers, tip=000000000000000000003b002ef7631e504a76fd0f46a4ed5905e6fa42b17701 [2023-08-15T13:23:00.601Z INFO electrs::chain] chain updated: tip=000000000000000000003b002ef7631e504a76fd0f46a4ed5905e6fa42b17701, height=803298 [2023-08-15T13:23:00.603Z DEBUG bitcoincore_rpc] JSON-RPC request: getblockchaininfo [] [2023-08-15T13:23:00.603Z DEBUG bitcoincore_rpc] JSON-RPC request: getnetworkinfo [] [2023-08-15T13:23:00.604Z DEBUG bitcoincore_rpc] JSON-RPC request: getnetworkinfo [] [2023-08-15T13:23:00.604Z DEBUG bitcoincore_rpc] JSON-RPC request: getblockchaininfo [] [2023-08-15T13:23:00.604Z DEBUG bitcoincore_rpc] JSON-RPC request: getnetworkinfo [] [2023-08-15T13:23:00.604Z DEBUG electrs::p2p] peer version: VersionMessage { version: 70016, services: ServiceFlags(1033), timestamp: 1692105780, receiver: Address {services: ServiceFlags(NONE), address: 0.0.0.0, port: 0}, sender: Address {services: ServiceFlags(NETWORK|WITNESS|NETWORK_LIMITED), address: 0.0.0.0, port: 0}, nonce: 788258135769235039, user_agent: "/Satoshi:24.0.1/", start_height: 803302, relay: true } [2023-08-15T13:23:00.605Z DEBUG electrs::p2p] got 4 new headers [2023-08-15T13:23:00.605Z INFO electrs::index] indexing 4 blocks: [803299..803302] [2023-08-15T13:23:00.605Z DEBUG electrs::p2p] loading 4 blocks [2023-08-15T13:23:00.751Z DEBUG electrs::index] writing 33610 funding and 27220 spending rows from 19149 transactions, 4 blocks [2023-08-15T13:23:00.768Z INFO electrs::chain] chain updated: tip=00000000000000000000f04941e8c2596a2986658dcc93077901b2a7f83a57b4, height=803302 [2023-08-15T13:23:00.768Z DEBUG electrs::p2p] got 0 new headers [2023-08-15T13:23:00.784Z INFO electrs::db] starting config compaction [2023-08-15T13:23:00.785Z INFO electrs::db] starting headers compaction [2023-08-15T13:23:01.368Z INFO electrs::db] starting txid compaction car[2023-08-15T13:26:45.489Z DEBUG electrs::p2p] peer inventory: [Block(0x00000000000000000003e00fc88fd8fcd2fd0cdea5c96b66be51c259189e1af7)] [2023-08-15T13:26:51.788Z INFO electrs::db] starting funding compaction [2023-08-15T13:26:52.094Z INFO electrs::db] starting spending compaction thread 'main' panicked at 'DB::put failed: Error { message: "IO error: While open a file for random read: ./db/bitcoin/000213.sst: Too many open files" }', src/db.rs:342:14 note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace [2023-08-15T13:26:52.095Z INFO electrs::db] closing DB at ./db/bitcoin [2023-08-15T13:26:53.125Z DEBUG electrs::p2p] closing p2p_loop thread: no more requests to handle [2023-08-15T13:26:53.125Z DEBUG electrs::p2p] closing p2p_send thread: no more messages to send [2023-08-15T13:26:53.125Z DEBUG electrs::p2p] closing p2p_recv thread: connection closed ```
torkelrogstad commented 1 year ago

Managed to get around this by increasing the file limit with ulimit -n LARGE_NUMBER. However I've never had software crash on my machine before due to this. I think it would be reasonable for electrs to try and stay within the allotted limit.

romanz commented 1 year ago

Thanks for reporting this issue! Will reproduce on a Linux machine to collect file descriptor usage metrics.

romanz commented 1 year ago

Additional context I tried building this with Prometheus support in order to inspect the file descriptor progress, but got this result:

❯ cargo build --locked --release --all-features
   Compiling electrs v0.10.0 (/Users/torkel/dev/rust/electrs)
error[E0432]: unresolved import `prometheus::process_collector`
 --> src/metrics.rs:6:21
  |
6 |     use prometheus::process_collector::ProcessCollector;
  |                     ^^^^^^^^^^^^^^^^^ could not find `process_collector` in `prometheus`

Unfortunately, process_collector doesn't support MacOS :( https://github.com/tikv/rust-prometheus/blob/6e81890773ef82e3bcc6c080d406543da1fb8073/src/process_collector.rs#L5

romanz commented 1 year ago

According to Prometheus, electrs uses ~350 file descriptors during compaction: image

romanz commented 1 year ago

However I've never had software crash on my machine before due to this.

I assume that the reason behind this is that the initial sync process writes many SST files (containing the index) and then merges them during the full compaction. In order to merge them, RocksDB needs to open all of them - failing if there are more than 256 SST files to be merged. This issue is expected since the blockchain is growing, and so is the number of files to merge during the full compaction. We'll probably need to update the docs to suggest increasing ulimit -n to a larger value (on my machine it is set to 1024).

torkelrogstad commented 1 year ago

Right, interesting, thanks! Probably a good idea to update the docs, if it isn't possible to get RocksDB to be stingier with the files. For reference, my ulimit -n is 256, which I believe is the default on macOS