n0-computer / beetle

Other
34 stars 15 forks source link

Stream returned from resolve_recursive_with_paths hangs for large non-hamt directory #43

Closed rklaehn closed 1 year ago

rklaehn commented 1 year ago

When creating a large directory and then traversing it again, the roundtrip test hangs as soon as the directory has more than 2048 entries.

This seems to be related to some bounded channel in the resolver:

        let (session_closer_s, session_closer_r) = async_channel::bounded(2048);

Setting this value to a larger value makes the test pass. Not sure what is happening? Something just accumulates in this channel is not pulled out, and when it is full things hang?

Arqu commented 1 year ago

Oh my, we should clean that up! That might be part of bitswap issues too.

rklaehn commented 1 year ago

So - this is easy enough to "fix". I tracked it down to impl Drop for LoaderContext {. There is a send_blocking in there. Blocking in Drop is never a good idea.

But - what is all this stuff, and why is the channel not drained? What are these sessions, and what is bad about them not being cleaned up? Just seems like a complex machinery...

dignifiedquire commented 1 year ago

use a multithreaded executor and the issue goes away…thanks tokio (I haven’t found a way to make this work with the single threaded executor) and tokio defaults to single threaded for tokio::test.

dignifiedquire commented 1 year ago

the machinery is needed because we must send an rpc message on drop, and tokio doesn’t allow us to execute any async code in drop sad face

rklaehn commented 1 year ago

Why not use try_send? Blocking in send seems like asking for trouble...

dignifiedquire commented 1 year ago

because we must not loose these messages, otherwise we leak memory

rklaehn commented 1 year ago

So what exactly is being cleaned up here? A bitswap session?

dignifiedquire commented 1 year ago

yes, the sessions

rklaehn commented 1 year ago

OK, I still don't like this very much. But it seems that it is not a bug.