subconsciousnetwork / noosphere

Noosphere is a protocol for thought; let's discover it together!
Apache License 2.0
667 stars 40 forks source link

Stack overflow/segfault in RocksDB stress test #675

Open jsantell opened 1 year ago

jsantell commented 1 year ago

Running into a stack overflow/segfault when running RocksDB in multiplayer::orb_can_render_peers_in_the_sphere_address_book, discovered in #655.

To summarize, introducing RocksDB causes a segfault in one of our stress tests, consistently reproducible locally and in CI, depending on things that we wouldn't expect to cause this, like the shape of some structs. In #655, we store a Storage instance in SphereDb. This change alone causes the segfault:

--- a/rust/noosphere-storage/src/db.rs
+++ b/rust/noosphere-storage/src/db.rs
@@ -40,6 +40,7 @@ where
     link_store: S::KeyValueStore,
     version_store: S::KeyValueStore,
     metadata_store: S::KeyValueStore,
+    storage: S,
 }

 impl<S> SphereDb<S>
@@ -52,6 +53,7 @@ where
             link_store: storage.get_key_value_store(LINK_STORE).await?,
             version_store: storage.get_key_value_store(VERSION_STORE).await?,
             metadata_store: storage.get_key_value_store(METADATA_STORE).await?,
+            storage: storage.to_owned(),
         })
     }

In #655, changing RocksDbStore's name property to be an Arc "fixes" the segfault:

--- a/rust/noosphere-storage/src/implementation/rocks_db.rs
+++ b/rust/noosphere-storage/src/implementation/rocks_db.rs
@@ -85,13 +85,13 @@ impl Storage for RocksDbStorage {

 #[derive(Clone)]
 pub struct RocksDbStore {
-    name: String,
+    name: Arc<String>,
     db: Arc<DbInner>,
 }

 impl RocksDbStore {
     pub fn new(db: Arc<DbInner>, name: String) -> Result<Self> {
-        Ok(RocksDbStore { db, name })
+        Ok(RocksDbStore { db, name: Arc::new(name) })
     }

While Arc is more appropriate here anyway, it shouldn't have an effect on this segfault. Using an even more appropriate Cow instead still fails. That is to say, there is some spooky issue regardless of #655 and using RocksDB.


Things we've tried:


Stack trace of the offending stack:

(cpp) rocksdb::TableCache::Get
..
BlockStore::get_block
..
OnceCell::get_or_try_init/closure
Sphere::to_memo
OnceCell::get_or_try_init/closure
Sphere::to_body
Sphere:get_content
Sphere::derive_mutation
Sphere::hydrate_with_cid
Sphere::hydrate
Sphere::hydrate_timeslice
Sphere::rebase
sync::fetch_remote_changes
jsantell commented 1 year ago

Also reproducible on arm64 Macos