Closed mabalaru closed 2 years ago
cc @ryoqun @behzadnouri @carllin
The Recycler
fixes should be constraining this usage right? Are there any missing in 1.6 that might help?
From the error code, this seems like an out of memory error, right?
cudaErrorMemoryAllocation = 2 The API call failed because it was unable to allocate enough memory to perform the requested operation.
The changes I had made are not backported to v1.6. If this is an oom problem, I am also not expecting that they will help.
From the error code, this seems like an out of memory error, right?
cudaErrorMemoryAllocation = 2 The API call failed because it was unable to allocate enough memory to perform the requested operation.
The changes I had made are not backported to v1.6. If this is an oom problem, I am also not expecting that they will help.
Yea. It is OOM.
I have an update on this situation. I downgraded the server to solana 1.5.19 and it's working fine for more then 24hrs. So the solana upgrade 1.6.x is causing this issue.
Hello, I did some testing with 1.6.10 version. This version is more stable and it takes 2-3 days to get to a panic like this:
Jun 2 08:02:03 m2-solana01 solana-validator[3206030]: [2021-06-02T05:02:03.139374444Z INFO solana_metrics::metrics] datapoint: banking_stage-loop-stats id=0i process_packets_count=876i new_tx_count=0i dropped_batches_count=0i newly_buffered_packets_count=221i current_buffered_packets_count=1065i rebuffered_packets_count=0i consume_buffered_packets_elapsed=0i process_packets_elapsed=3038i handle_retryable_packets_elapsed=0i filter_pending_packets_elapsed=0i packet_duplicate_check_elapsed=614i packet_conversion_elapsed=0i transaction_processing_elapsed=0i
Jun 2 08:02:03 m2-solana01 solana-validator[3206030]: [2021-06-02T05:02:03.396988224Z INFO solana_metrics::metrics] datapoint: shred_fetch_tvu_forwards index_overrun=0i shred_count=541i slot_bad_deserialize=0i index_bad_deserialize=0i index_out_of_bounds=0i slot_out_of_range=0i duplicate_shred=0i
Jun 2 08:02:03 m2-solana01 solana-validator[3206030]: thread 'solana-receiver' panicked at 'cudaHostRegister error: 2 ptr: 0x7f45e6ab8300 bytes: 167936', perf/src/cuda_runtime.rs:33:17
Jun 2 08:02:03 m2-solana01 solana-validator[3206030]: stack backtrace:
Jun 2 08:02:03 m2-solana01 solana-validator[3206030]: 0: rust_begin_unwind
Jun 2 08:02:03 m2-solana01 solana-validator[3206030]: at ./rustc/2fd73fabe469357a12c2c974c140f67e7cdd76d0/library/std/src/panicking.rs:493:5
Jun 2 08:02:03 m2-solana01 solana-validator[3206030]: 1: std::panicking::begin_panic_fmt
Jun 2 08:02:03 m2-solana01 solana-validator[3206030]: at ./rustc/2fd73fabe469357a12c2c974c140f67e7cdd76d0/library/std/src/panicking.rs:435:5
Jun 2 08:02:03 m2-solana01 solana-validator[3206030]: 2: solana_perf::cuda_runtime::pin
Jun 2 08:02:03 m2-solana01 solana-validator[3206030]: 3: solana_perf::cuda_runtime::PinnedVec<T>::reserve_and_pin
Jun 2 08:02:03 m2-solana01 solana-validator[3206030]: 4: solana_perf::packet::Packets::new_with_recycler
Jun 2 08:02:03 m2-solana01 solana-validator[3206030]: 5: solana_streamer::streamer::recv_loop
Jun 2 08:02:03 m2-solana01 solana-validator[3206030]: note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
Jun 2 08:02:03 m2-solana01 solana-validator[3206030]: [2021-06-02T05:02:03.458706709Z ERROR solana_metrics::metrics] datapoint: panic program="validator" thread="solana-receiver" one=1i message="panicked at 'cudaHostRegister error: 2 ptr: 0x7f45e6ab8300 bytes: 167936', perf/src/cuda_runtime.rs:33:17" location="perf/src/cuda_runtime.rs:33:17"
Jun 2 08:02:03 m2-solana01 solana-validator[3206030]: [2021-06-02T05:02:03.458778236Z INFO solana_metrics::metrics] submitting 197 points
Jun 2 08:02:04 m2-solana01 solana-validator[3206030]: [2021-06-02T05:02:04.000809442Z INFO solana_runtime::accounts_db] finalize_dead_slot_removal: slots [80605618, 80605549, 80605602, 80605608, 80605550, 80605624, 80605537, 80605565, 80605600, 80605588, 80605563, 80605543, 80605529, 80605532, 80605595, 80605591, 80605539, 80605641, 80605561, 80605564, 80605548, 80605653, 80605536, 80605617, 80605551, 80605643, 80605648, 80605633, 80605644, 80605640, 80605642, 80605566, 80605649, 80605586, 80605584, 80605621, 80605560, 80605610, 80605629, 80605627, 80605647, 80605544, 80605612, 80605634, 80605534, 80605525, 80605615, 80605646, 80605542, 80605592, 80605531, 80605597, 80605611, 80605645, 80605596, 80605562, 80605650]
Jun 2 08:02:04 m2-solana01 solana-validator[3206030]: thread 'solana-receiver' panicked at 'cudaHostRegister error: 2 ptr: 0x7f37129d3f80 bytes: 167936', perf/src/cuda_runtime.rs:33:17
Jun 2 08:02:04 m2-solana01 solana-validator[3206030]: stack backtrace:
Jun 2 08:02:04 m2-solana01 solana-validator[3206030]: 0: rust_begin_unwind
Jun 2 08:02:04 m2-solana01 solana-validator[3206030]: at ./rustc/2fd73fabe469357a12c2c974c140f67e7cdd76d0/library/std/src/panicking.rs:493:5
Jun 2 08:02:04 m2-solana01 solana-validator[3206030]: 1: std::panicking::begin_panic_fmt
Jun 2 08:02:04 m2-solana01 solana-validator[3206030]: at ./rustc/2fd73fabe469357a12c2c974c140f67e7cdd76d0/library/std/src/panicking.rs:435:5
Jun 2 08:02:04 m2-solana01 solana-validator[3206030]: 2: solana_perf::cuda_runtime::pin
Jun 2 08:02:04 m2-solana01 solana-validator[3206030]: 3: solana_perf::cuda_runtime::PinnedVec<T>::reserve_and_pin
Jun 2 08:02:04 m2-solana01 solana-validator[3206030]: 4: solana_perf::packet::Packets::new_with_recycler
Jun 2 08:02:04 m2-solana01 solana-validator[3206030]: 5: solana_streamer::streamer::recv_loop
Jun 2 08:02:04 m2-solana01 solana-validator[3206030]: note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
Jun 2 08:02:04 m2-solana01 solana-validator[3206030]: [2021-06-02T05:02:04.133892003Z INFO solana_metrics::metrics] datapoint: banking_stage-loop-stats id=3i process_packets_count=0i new_tx_count=0i dropped_batches_count=0i newly_buffered_packets_count=0i current_buffered_packets_count=0i rebuffered_packets_count=0i consume_buffered_packets_elapsed=0i process_packets_elapsed=0i handle_retryable_packets_elapsed=0i filter_pending_packets_elapsed=0i packet_duplicate_check_elapsed=0i packet_conversion_elapsed=0i transaction_processing_elapsed=0i
Jun 2 08:02:04 m2-solana01 solana-validator[3206030]: [2021-06-02T05:02:04.133911343Z INFO solana_metrics::metrics] datapoint: shred_fetch index_overrun=0i shred_count=17i slot_bad_deserialize=0i index_bad_deserialize=0i index_out_of_bounds=0i slot_out_of_range=0i duplicate_shred=0i
Jun 2 08:02:04 m2-solana01 solana-validator[3206030]: [2021-06-02T05:02:04.133951504Z INFO solana_metrics::metrics] datapoint: poh-service ticks=176i hashes=2207744i elapsed_us=365594i total_sleep_us=0i total_tick_time_us=272i total_lock_time_us=69i total_hash_time_us=1001647i total_record_time_us=0i
Jun 2 08:02:04 m2-solana01 solana-validator[3206030]: [2021-06-02T05:02:04.133974456Z ERROR solana_metrics::metrics] datapoint: panic program="validator" thread="solana-receiver" one=1i message="panicked at 'cudaHostRegister error: 2 ptr: 0x7f37129d3f80 bytes: 167936', perf/src/cuda_runtime.rs:33:17" location="perf/src/cuda_runtime.rs:33:17"
Jun 2 08:02:04 m2-solana01 solana-validator[3206030]: [2021-06-02T05:02:04.133990708Z INFO solana_metrics::metrics] submitting 10 points
Jun 2 08:02:04 m2-solana01 solana-validator[3206030]: thread '<unnamed>' panicked at 'called `Result::unwrap()` on an `Err` value: reqwest::Error { kind: Builder, source: Normal(ErrorStack([])) }', metrics/src/metrics.rs:107:18
Jun 2 08:02:04 m2-solana01 solana-validator[3206030]: stack backtrace:
Jun 2 08:02:04 m2-solana01 solana-validator[3206030]: 0: rust_begin_unwind
Jun 2 08:02:04 m2-solana01 solana-validator[3206030]: at ./rustc/2fd73fabe469357a12c2c974c140f67e7cdd76d0/library/std/src/panicking.rs:493:5
Jun 2 08:02:04 m2-solana01 solana-validator[3206030]: 1: core::panicking::panic_fmt
Jun 2 08:02:04 m2-solana01 solana-validator[3206030]: at ./rustc/2fd73fabe469357a12c2c974c140f67e7cdd76d0/library/core/src/panicking.rs:92:14
Jun 2 08:02:04 m2-solana01 solana-validator[3206030]: 2: core::option::expect_none_failed
Jun 2 08:02:04 m2-solana01 solana-validator[3206030]: at ./rustc/2fd73fabe469357a12c2c974c140f67e7cdd76d0/library/core/src/option.rs:1300:5
Jun 2 08:02:04 m2-solana01 solana-validator[3206030]: 3: <solana_metrics::metrics::InfluxDbMetricsWriter as solana_metrics::metrics::MetricsWriter>::write
Jun 2 08:02:04 m2-solana01 solana-validator[3206030]: 4: solana_metrics::metrics::MetricsAgent::write
Jun 2 08:02:04 m2-solana01 solana-validator[3206030]: 5: solana_metrics::metrics::MetricsAgent::run
Jun 2 08:02:04 m2-solana01 solana-validator[3206030]: note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
Jun 2 08:02:04 m2-solana01 solana-validator[3206030]: total_gpus: 4
Jun 2 08:02:30 m2-solana01 systemd[1]: solana.service: Main process exited, code=exited, status=1/FAILURE
Jun 2 08:02:30 m2-solana01 systemd[1]: solana.service: Failed with result 'exit-code'.
Before this, it happened again on: May 31 11:47:38
May 31 11:47:38 m2-solana01 solana-validator[1504659]: thread 'solana-receiver' panicked at 'cudaHostRegister error: 2 ptr: 0x7f10188654c0 bytes: 167936', perf/src/cuda_runtime.rs:33:17
May 31 11:47:38 m2-solana01 solana-validator[1504659]: stack backtrace:
May 31 11:47:38 m2-solana01 solana-validator[1504659]: 0: rust_begin_unwind
May 31 11:47:38 m2-solana01 solana-validator[1504659]: at ./rustc/2fd73fabe469357a12c2c974c140f67e7cdd76d0/library/std/src/panicking.rs:493:5
May 31 11:47:38 m2-solana01 solana-validator[1504659]: 1: std::panicking::begin_panic_fmt
May 31 11:47:38 m2-solana01 solana-validator[1504659]: at ./rustc/2fd73fabe469357a12c2c974c140f67e7cdd76d0/library/std/src/panicking.rs:435:5
May 31 11:47:38 m2-solana01 solana-validator[1504659]: 2: solana_perf::cuda_runtime::pin
May 31 11:47:38 m2-solana01 solana-validator[1504659]: 3: solana_perf::cuda_runtime::PinnedVec<T>::reserve_and_pin
May 31 11:47:38 m2-solana01 solana-validator[1504659]: 4: solana_perf::packet::Packets::new_with_recycler
May 31 11:47:38 m2-solana01 solana-validator[1504659]: 5: solana_streamer::streamer::recv_loop
May 31 11:47:38 m2-solana01 solana-validator[1504659]: note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
May 31 11:47:38 m2-solana01 solana-validator[1504659]: [2021-05-31T08:47:38.607163761Z ERROR solana_metrics::metrics] datapoint: panic program="validator" thread="solana-receiver" one=1i message="panicked at 'cudaHostRegister error: 2 ptr: 0x7f10188654c0 bytes: 167936', perf/src/cuda_runtime.rs:33:17" location="perf/src/cuda_runtime.rs:33:17"
May 31 11:47:38 m2-solana01 solana-validator[1504659]: [2021-05-31T08:47:38.607209230Z INFO solana_metrics::metrics] submitting 135 points
May 31 11:47:39 m2-solana01 solana-validator[1504659]: [2021-05-31T08:47:39.443855818Z INFO solana_runtime::accounts_db] finalize_dead_slot_removal: slots [80337641, 80337679, 80337670, 80337592, 80337626, 80337600, 80337665, 80337694, 80337637, 80337677, 80337672, 80337616, 80337615, 80337673, 80337605, 80337606, 80337647, 80337645, 80337625, 80337623, 80337658, 80337644, 80337632, 80337587, 80337586, 80337598, 80337643, 80337652, 80337624, 80337687, 80337666, 80337595, 80337669, 80337654, 80337656, 80337675, 80337667, 80337618, 80337664, 80337638, 80337636, 80337640, 80337599, 80337617, 80337668, 80337607, 80337674, 80337693, 80337646, 80337614, 80337676, 80337622, 80337678, 80337655, 80337695, 80337685, 80337659, 80337684, 80337671, 80337621, 80337593, 80337657, 80337619, 80337653, 80337612, 80337686, 80337692, 80337613]
May 31 11:47:39 m2-solana01 solana-validator[1504659]: [2021-05-31T08:47:39.456632353Z INFO solana_core::repair_service] repair_stats: [(80769831, 11), (80769824, 1120)]
May 31 11:47:39 m2-solana01 solana-validator[1504659]: [2021-05-31T08:47:39.554988934Z INFO solana_metrics::metrics] datapoint: shred_insert_is_full total_time_ms=770i slot=80769824i last_index=1056i
May 31 11:47:39 m2-solana01 solana-validator[1504659]: [2021-05-31T08:47:39.555014895Z INFO solana_metrics::metrics] datapoint: poh-service ticks=173i hashes=2170112i elapsed_us=372228i total_sleep_us=0i total_tick_time_us=196i total_lock_time_us=48i total_hash_time_us=1002557i total_record_time_us=0i
May 31 11:47:39 m2-solana01 solana-validator[1504659]: [2021-05-31T08:47:39.555066203Z INFO solana_metrics::metrics] datapoint: banking_stage-loop-stats id=3i process_packets_count=0i new_tx_count=0i dropped_batches_count=0i newly_buffered_packets_count=0i current_buffered_packets_count=0i rebuffered_packets_count=0i consume_buffered_packets_elapsed=0i process_packets_elapsed=0i handle_retryable_packets_elapsed=0i filter_pending_packets_elapsed=0i packet_duplicate_check_elapsed=0i packet_conversion_elapsed=0i transaction_processing_elapsed=0i
May 31 11:47:39 m2-solana01 solana-validator[1504659]: [2021-05-31T08:47:39.555074298Z INFO solana_metrics::metrics] datapoint: shred_fetch_repair index_overrun=0i shred_count=684i slot_bad_deserialize=0i index_bad_deserialize=0i index_out_of_bounds=0i slot_out_of_range=0i duplicate_shred=304i
May 31 11:47:39 m2-solana01 solana-validator[1504659]: [2021-05-31T08:47:39.555091904Z INFO solana_metrics::metrics] datapoint: banking_stage-loop-stats id=1i process_packets_count=0i new_tx_count=0i dropped_batches_count=0i newly_buffered_packets_count=0i current_buffered_packets_count=0i rebuffered_packets_count=0i consume_buffered_packets_elapsed=0i process_packets_elapsed=0i handle_retryable_packets_elapsed=0i filter_pending_packets_elapsed=0i packet_duplicate_check_elapsed=0i packet_conversion_elapsed=0i transaction_processing_elapsed=0i
May 31 11:47:39 m2-solana01 solana-validator[1504659]: [2021-05-31T08:47:39.555097737Z INFO solana_metrics::metrics] datapoint: banking_stage-loop-stats id=2i process_packets_count=0i new_tx_count=0i dropped_batches_count=0i newly_buffered_packets_count=0i current_buffered_packets_count=0i rebuffered_packets_count=0i consume_buffered_packets_elapsed=0i process_packets_elapsed=0i handle_retryable_packets_elapsed=0i filter_pending_packets_elapsed=0i packet_duplicate_check_elapsed=0i packet_conversion_elapsed=0i transaction_processing_elapsed=0i
May 31 11:47:39 m2-solana01 solana-validator[1504659]: [2021-05-31T08:47:39.555149677Z INFO solana_metrics::metrics] datapoint: serve_repair-repair repair-total=1131i shred-count=1131i highest-shred-count=0i orphan-count=0i repair-highest-slot=0i repair-orphan=0i
May 31 11:47:39 m2-solana01 solana-validator[1504659]: [2021-05-31T08:47:39.555165647Z INFO solana_metrics::metrics] datapoint: serve_repair-repair-timing set-root-elapsed=31i get-votes-elapsed=622i add-votes-elapsed=2003i get-best-orphans-elapsed=1258i get-best-shreds-elapsed=11177i send-repairs-elapsed=12427i
May 31 11:47:39 m2-solana01 solana-validator[1504659]: [2021-05-31T08:47:39.555565490Z INFO solana_metrics::metrics] datapoint: accounts_db_store_timings hash_accounts=48234i store_accounts=18847i update_index=3406i handle_reclaims=12i append_accounts=0i find_storage=0i num_accounts=2174i total_data=173438325i
May 31 11:47:39 m2-solana01 solana-validator[1504659]: [2021-05-31T08:47:39.555579348Z INFO solana_metrics::metrics] datapoint: accounts_db_store_timings2 recycle_store_count=0i current_recycle_store_count=1001i current_recycle_store_bytes=3113361408i create_store_count=0i store_get_slot_store=0i store_find_existing=0i dropped_stores=68i
May 31 11:47:39 m2-solana01 solana-validator[1504659]: [2021-05-31T08:47:39.556919286Z INFO solana_metrics::metrics] datapoint: recv-window-insert-shreds num_shreds=6076i total_elapsed=1228537i insert_lock_elapsed=0i insert_shreds_elapsed=60234i shred_recovery_elapsed=1126713i chaining_elapsed=118i commit_working_sets_elapsed=5253i write_batch_elapsed=29458i num_inserted=2661i num_repair=7i num_recovered=967i num_recovered_inserted=967i num_recovered_failed_sig=0i num_recovered_failed_invalid=0i num_recovered_exists=0i
May 31 11:47:39 m2-solana01 solana-validator[1504659]: [2021-05-31T08:47:39.557725041Z INFO solana_metrics::counter] COUNTER:{"name": "bank-process_transactions-txs", "counts": 167184118, "samples": 147884000, "now": 1622450859557, "events": 0}
May 31 11:47:39 m2-solana01 solana-validator[1504659]: [2021-05-31T08:47:39.557774479Z INFO solana_metrics::counter] COUNTER:{"name": "bank-process_transactions", "counts": 226163319, "samples": 172811000, "now": 1622450859557, "events": 1}
May 31 11:47:39 m2-solana01 solana-validator[1504659]: [2021-05-31T08:47:39.557801232Z INFO solana_metrics::counter] COUNTER:{"name": "bank-process_transactions-sigs", "counts": 203576439, "samples": 147884000, "now": 1622450859557, "events": 1}
May 31 11:47:39 m2-solana01 solana-validator[1504659]: ERR: driver shutting down cuda-ecc-ed25519/gpu_ctx.cu 68
May 31 11:47:39 m2-solana01 solana-validator[1504659]: solana-validator: common/gpu_common.h:22: void cuda_assert(cudaError_t, const char*, int): Assertion `0' failed.
May 31 11:47:39 m2-solana01 solana-validator[1504659]: thread 'solana-listen' panicked at 'cudaHostUnregister returned: 4 ptr: 0x7f1031dbcb40', /var/lib/buildkite-agent/builds/froome-1/solana-labs/solana-secondary/perf/src/cuda_runtime.rs:51:17
May 31 11:47:39 m2-solana01 solana-validator[1504659]: stack backtrace:
May 31 11:47:39 m2-solana01 solana-validator[1504659]: 0: rust_begin_unwind
May 31 11:47:39 m2-solana01 solana-validator[1504659]: at ./rustc/2fd73fabe469357a12c2c974c140f67e7cdd76d0/library/std/src/panicking.rs:493:5
May 31 11:47:39 m2-solana01 solana-validator[1504659]: 1: std::panicking::begin_panic_fmt
May 31 11:47:39 m2-solana01 solana-validator[1504659]: at ./rustc/2fd73fabe469357a12c2c974c140f67e7cdd76d0/library/std/src/panicking.rs:435:5
May 31 11:48:06 m2-solana01 solana-validator[3206030]: [2021-05-31T08:48:06.151106572Z INFO solana_validator] solana-validator 1.6.10 (src:5d4654d2; feat:3533521759)
Thank you in advance for any help.
ran solana 1.6.10 with cuda 10.1 as suggested into the documentation. The error is still present after 1h of running:
Jun 3 01:54:41 m2-solana01 solana-validator[3875]: thread 'solana-receiver' panicked at 'cudaHostRegister error: 2 ptr: 0x7f9b78a1fd80 bytes: 167936', perf/src/cuda_runtime.rs:33:17
Jun 3 01:54:41 m2-solana01 solana-validator[3875]: stack backtrace:
Jun 3 01:54:41 m2-solana01 solana-validator[3875]: [2021-06-02T22:54:41.016598222Z INFO solana_core::repair_service] repair_stats: [(81144180, 127), (81144181, 447)]
Jun 3 01:54:41 m2-solana01 solana-validator[3875]: [2021-06-02T22:54:41.016625217Z INFO solana_metrics::metrics] datapoint: serve_repair-repair repair-total=574i shred-count=574i highest-shred-count=0i orphan-count=0i repair-highest-slot=0i repair-orphan=0i
Jun 3 01:54:41 m2-solana01 solana-validator[3875]: [2021-06-02T22:54:41.016637150Z INFO solana_metrics::metrics] datapoint: serve_repair-repair-timing set-root-elapsed=41i get-votes-elapsed=1127i add-votes-elapsed=2353i get-best-orphans-elapsed=534i get-best-shreds-elapsed=7023i send-repairs-elapsed=7099i
Jun 3 01:54:41 m2-solana01 solana-validator[3875]: [2021-06-02T22:54:41.017157813Z INFO solana_metrics::metrics] datapoint: retransmit-first-shred slot=81144182i
Jun 3 01:54:41 m2-solana01 solana-validator[3875]: [2021-06-02T22:54:41.045406109Z INFO solana_metrics::metrics] datapoint: shred_insert_is_full total_time_ms=1212i slot=81144181i last_index=630i
Jun 3 01:54:41 m2-solana01 solana-validator[3875]: 0: rust_begin_unwind
Jun 3 01:54:41 m2-solana01 solana-validator[3875]: at ./rustc/2fd73fabe469357a12c2c974c140f67e7cdd76d0/library/std/src/panicking.rs:493:5
Jun 3 01:54:41 m2-solana01 solana-validator[3875]: 1: std::panicking::begin_panic_fmt
Jun 3 01:54:41 m2-solana01 solana-validator[3875]: at ./rustc/2fd73fabe469357a12c2c974c140f67e7cdd76d0/library/std/src/panicking.rs:435:5
Jun 3 01:54:41 m2-solana01 solana-validator[3875]: 2: solana_perf::cuda_runtime::pin
Jun 3 01:54:41 m2-solana01 solana-validator[3875]: 3: solana_perf::cuda_runtime::PinnedVec<T>::reserve_and_pin
Jun 3 01:54:41 m2-solana01 solana-validator[3875]: 4: solana_perf::packet::Packets::new_with_recycler
Jun 3 01:54:41 m2-solana01 solana-validator[3875]: 5: solana_streamer::streamer::recv_loop
Jun 3 01:54:41 m2-solana01 solana-validator[3875]: note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
Jun 3 01:54:41 m2-solana01 solana-validator[3875]: [2021-06-02T22:54:41.067728440Z ERROR solana_metrics::metrics] datapoint: panic program="validator" thread="solana-receiver" one=1i message="panicked at 'cudaHostRegister error: 2 ptr: 0x7f9b78a1fd80 bytes: 167936', perf/src/cuda_runtime.rs:33:17" location="perf/src/cuda_runtime.rs:33:17"
Jun 3 01:54:41 m2-solana01 solana-validator[3875]: [2021-06-02T22:54:41.067759175Z INFO solana_metrics::metrics] submitting 83 points
Jun 3 01:54:41 m2-solana01 solana-validator[3875]: thread 'solana-receiver' panicked at 'cudaHostRegister error: 2 ptr: 0x7fa955875ec0 bytes: 167936', perf/src/cuda_runtime.rs:33:17
Jun 3 01:54:41 m2-solana01 solana-validator[3875]: stack backtrace:
Jun 3 01:54:41 m2-solana01 solana-validator[3875]: 0: rust_begin_unwind
Jun 3 01:54:41 m2-solana01 solana-validator[3875]: at ./rustc/2fd73fabe469357a12c2c974c140f67e7cdd76d0/library/std/src/panicking.rs:493:5
Jun 3 01:54:41 m2-solana01 solana-validator[3875]: 1: std::panicking::begin_panic_fmt
Jun 3 01:54:41 m2-solana01 solana-validator[3875]: at ./rustc/2fd73fabe469357a12c2c974c140f67e7cdd76d0/library/std/src/panicking.rs:435:5
Jun 3 01:54:41 m2-solana01 solana-validator[3875]: 2: solana_perf::cuda_runtime::pin
Jun 3 01:54:41 m2-solana01 solana-validator[3875]: 3: solana_perf::cuda_runtime::PinnedVec<T>::reserve_and_pin
Jun 3 01:54:41 m2-solana01 solana-validator[3875]: 4: solana_perf::packet::Packets::new_with_recycler
Jun 3 01:54:41 m2-solana01 solana-validator[3875]: 5: solana_streamer::streamer::recv_loop
Jun 3 01:54:41 m2-solana01 solana-validator[3875]: note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
Try with newer versions and re-open if still present.
This issue has been automatically locked since there has not been any activity in past 7 days after it was closed. Please open a new issue for related bugs.
Validator configuration:
Problem
Validator is failing when CUDA is enabled with the following error :
Proposed Solution
validator should work with CUDA enabled.