paritytech / substrate

Substrate: The platform for blockchain innovators
Apache License 2.0
8.39k stars 2.65k forks source link

Investigate and remove possible memory leaks #3999

Closed marcio-diaz closed 4 years ago

marcio-diaz commented 4 years ago
2019-11-02 08:23:12 Idle (0 peers), best: #44 (0x762c…720a), finalized #42 (0x57e2…9caf), ⬇ 0 ⬆ 0
2019-11-02 08:23:15 Starting consensus session on top of parent 0x762c0cab5ca7684b725463a187fa06ca47e1c039cb5944eeb24c13f1d1ac720a
2019-11-02 08:23:15 Prepared block for proposing at 45 [hash: 0x05823ef023fcc977490a444cdaee55229d8b7b85ee4bccb3a796eea76000f4c6; parent_hash: 0x762c…720a; extrinsics: [0xbb25…1508, 0xa081…f2f0]]
2019-11-02 08:23:15 Pre-sealed block for proposal at 45. Hash now 0xee99492b67b32743834978af8e5dd50747de3c5142ff9a8e6e0fa06460325028, previously 0x05823ef023fcc977490a444cdaee55229d8b7b85ee4bccb3a796eea76000f4c6.
2019-11-02 08:23:15 Imported #45 (0xee99…5028)
^C==374== 
==374== HEAP SUMMARY:
==374==     in use at exit: 280,500 bytes in 401 blocks
==374==   total heap usage: 3,258,873 allocs, 3,258,472 frees, 3,621,742,675 bytes allocated
...
==374== 
==374== 2,432 bytes in 8 blocks are possibly lost in loss record 92 of 103
==374==    at 0x483AB65: calloc (vg_replace_malloc.c:762)
==374==    by 0x4012A51: allocate_dtv (in /usr/lib/ld-2.30.so)
==374==    by 0x40133C1: _dl_allocate_tls (in /usr/lib/ld-2.30.so)
==374==    by 0x4DEA1BE: pthread_create@@GLIBC_2.2.5 (in /usr/lib/libpthread-2.30.so)
==374==    by 0x172F9A1: std::sys::unix::thread::Thread::new (thread.rs:67)
==374==    by 0xDB59AE: std::thread::Builder::spawn (in /home/marcio/repos/substrate/target/release/substrate)
==374==    by 0xDB0631: <rayon_core::registry::DefaultSpawn as rayon_core::registry::ThreadSpawn>::spawn (in /home/marcio/repos/substrate/target/release/substrate)
==374==    by 0xDB0DB6: rayon_core::registry::Registry::new (in /home/marcio/repos/substrate/target/release/substrate)
==374==    by 0xDB5146: std::sync::once::Once::call_once::{{closure}} (in /home/marcio/repos/substrate/target/release/substrate)
==374==    by 0x1727AC3: std::sync::once::Once::call_inner (once.rs:392)
==374==    by 0xDB0855: rayon_core::registry::global_registry (in /home/marcio/repos/substrate/target/release/substrate)
==374==    by 0xDB4CCB: rayon_core::current_num_threads (in /home/marcio/repos/substrate/target/release/substrate)
==374== 
==374== 14,728 bytes in 7 blocks are possibly lost in loss record 96 of 103
==374==    at 0x483877F: malloc (vg_replace_malloc.c:309)
==374==    by 0x13F1201: crossbeam_epoch::internal::Local::register (in /home/marcio/repos/substrate/target/release/substrate)
==374==    by 0x13F29B4: _ZN3std6thread5local4fast12Key$LT$T$GT$14try_initialize17haf92c22470639096E.llvm.12819621228626589813 (in /home/marcio/repos/substrate/target/release/substrate)
==374==    by 0x13EE803: crossbeam_deque::Stealer<T>::steal_batch_and_pop (in /home/marcio/repos/substrate/target/release/substrate)
==374==    by 0x13E2C15: tokio_threadpool::worker::Worker::run (in /home/marcio/repos/substrate/target/release/substrate)
==374==    by 0x13CC97E: std::thread::local::LocalKey<T>::with (in /home/marcio/repos/substrate/target/release/substrate)
==374==    by 0x13CC867: std::thread::local::LocalKey<T>::with (in /home/marcio/repos/substrate/target/release/substrate)
==374==    by 0x13CC70E: tokio_reactor::with_default (in /home/marcio/repos/substrate/target/release/substrate)
==374==    by 0x13CFBFE: tokio::runtime::threadpool::builder::Builder::build::{{closure}} (in /home/marcio/repos/substrate/target/release/substrate)
==374==    by 0x13E9F85: std::thread::local::LocalKey<T>::with (in /home/marcio/repos/substrate/target/release/substrate)
==374==    by 0x13E9A2E: std::thread::local::LocalKey<T>::with (in /home/marcio/repos/substrate/target/release/substrate)
==374==    by 0x13E3F40: std::sys_common::backtrace::__rust_begin_short_backtrace (in /home/marcio/repos/substrate/target/release/substrate)
==374== 
==374== 24,720 (16 direct, 24,704 indirect) bytes in 1 blocks are definitely lost in loss record 100 of 103
==374==    at 0x4838DEF: operator new(unsigned long) (vg_replace_malloc.c:344)
==374==    by 0xF6C6A3: rocksdb_cache_create_lru (in /home/marcio/repos/substrate/target/release/substrate)
==374==    by 0xF64AFC: rocksdb::db_options::<impl rocksdb::BlockBasedOptions>::set_lru_cache (in /home/marcio/repos/substrate/target/release/substrate)
==374==    by 0xF540EB: kvdb_rocksdb::Database::open (in /home/marcio/repos/substrate/target/release/substrate)
==374==    by 0xF462BE: substrate_client_db::utils::open_database (in /home/marcio/repos/substrate/target/release/substrate)
==374==    by 0x5C65D3: substrate_client_db::new_client (in /home/marcio/repos/substrate/target/release/substrate)
==374==    by 0x3D4E48: substrate_service::builder::ServiceBuilder<(),(),TCfg,TGen,TCSExt,(),(),(),(),(),(),(),(),(),()>::new_full (in /home/marcio/repos/substrate/target/release/substrate)
==374==    by 0x7778CD: node_cli::service::new_full (in /home/marcio/repos/substrate/target/release/substrate)
==374==    by 0x702921: substrate_cli::ParseAndPrepareRun<RP>::run (in /home/marcio/repos/substrate/target/release/substrate)
==374==    by 0x30FF80: node_cli::cli::run (in /home/marcio/repos/substrate/target/release/substrate)
==374==    by 0x699A7D: substrate::main (in /home/marcio/repos/substrate/target/release/substrate)
==374==    by 0x40D9F1: std::rt::lang_start::{{closure}} (in /home/marcio/repos/substrate/target/release/substrate)
==374== 
==374== LEAK SUMMARY:
==374==    definitely lost: 16 bytes in 1 blocks
==374==    indirectly lost: 24,704 bytes in 66 blocks
==374==      possibly lost: 17,904 bytes in 22 blocks
==374==    still reachable: 237,876 bytes in 312 blocks
==374==         suppressed: 0 bytes in 0 blocks
==374== Reachable blocks (those to which a pointer was found) are not shown.
==374== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==374== 
==374== For lists of detected and suppressed errors, rerun with: -s
==374== ERROR SUMMARY: 10 errors from 10 contexts (suppressed: 0 from 0)
bkchr commented 4 years ago

This doesn't look like a memory leak, just false positives. There is nothing in the trace that is called periodically. I think here is nothing to win.

marcio-diaz commented 4 years ago

This doesn't look like a memory leak, just false positives. There is nothing in the trace that is called periodically. I think here is nothing to win.

Even if the loop is me running and killing with ctr-c while debugging something? :)

bkchr commented 4 years ago

I don't understand your question. If you look at the traces, you see rocksdb, rayon and crossbeam. Everything looks like it initializes some global state, that is done once. Valgrind is always reporting this kind of stuff as false positives.

marcio-diaz commented 4 years ago

Are you sure that nothing is being leaked? I though that at least this one may be real because of the definitely:

==374== 24,720 (16 direct, 24,704 indirect) bytes in 1 blocks are definitely lost in loss record 100 of 103
==374==    at 0x4838DEF: operator new(unsigned long) (vg_replace_malloc.c:344)

I tried checking for memory leaks because last days I was debugging (running & stopping) substrate quite a lot and noticed that my computer was getting slow.

Feel free to close if you are sure it's ok.

bkchr commented 4 years ago

Memory leaks only exist while the program runs, after the program is closed, the os frees all memory. So, even if you leak everything, the os cleans it up afterwards.

xlc commented 4 years ago

One time small memory leak is pointless to investigate because it is most likely relates to some initialisation code and hence a false positive. Even it is a real leak, as long as the leaked memory doesn't grow, it won't worth the time to investigate. I usually have the program ran for few hours and see the memory usage is actually growing indefinitely (without been recycled at all) before start investigating. If debugging makes your computer slow, it is more likely the debugger leaks resources.