nix-community / lorri

Your project’s nix-env [maintainer=@Profpatsch,@nyarly]
Apache License 2.0
663 stars 24 forks source link

Remove varlink and bring back the native server/client socket serialization #22

Closed Profpatsch closed 3 years ago

Profpatsch commented 3 years ago

Fixes https://github.com/nix-community/lorri/issues/4

Please see commit messages.

The lorri internal stream-events output probably changes a lot, I didn’t spend any time keeping it backwards compatible. But since it’s experimental and internal we don’t have to care as much. It’s still using serde Serialize instances, but some care was taken to split it from the internal representation.

Profpatsch commented 3 years ago

Okay we really need to get rid of the “always up to date clippy in CI” thing, it’s aggravating.

symphorien commented 3 years ago

lorri internal stream-events --kind snapshot just hangs for me. I'm running the daemon with systemd, and paid attention that the lorri command and the lorri daemon are both from this PR. lorri internal ping works, though.

Profpatsch commented 3 years ago

lorri internal stream-events --kind snapshot just hangs for me.

have you been using this before? The logic around this is very confused, I copied it verbatim. I don’t think I broke anything, but it’s rather hard to figure out. I’d say this will need a rewrite to clean up in the future, so let’s ignore it for now.

Profpatsch commented 3 years ago

Okay, again, the only CI failure is coming from the newer clippy warnings, which happens because CI always fetches the nightly clippy, which we should remove because they are not easy to reproduce locally.

Profpatsch commented 3 years ago

LGTM

symphorien commented 3 years ago

have you been using this before? The logic around this is very confused, I copied it verbatim. I don’t think I broke anything, but it’s rather hard to figure out.

Yes, since 2020-04-25 modulo command renaming (it used to be lorri internal__stream_events --kind snapshot) Here is at least half the fix #25 The command is still hanging, I don't remember if it used to.

symphorien commented 3 years ago

I think I found the source of the problem: when thread in ops::stream_events is dropped, we wait on it, that is forever. So the function cannot return. I tried to fix it with

diff --git a/src/ops.rs b/src/ops.rs
index a7f3ca0..95ca37f 100644
--- a/src/ops.rs
+++ b/src/ops.rs
@@ -606,6 +606,12 @@ pub fn stream_events(kind: EventKind) -> OpResult {
         })
     };

+    // when running in snapshot mode, we should exit as soon as we print the answer
+    // of the daemon. However, when dropping `thread` at the end of the function,
+    // we will wait for it and hang indefinitely.
+    // Let's leak it, anyway we are going to exit the process as a whole very soon.
+    let thread = std::mem::ManuallyDrop::new(thread);
+
     let mut snapshot_done = false;
     loop {
         select! {

but then lorri internal stream-events fails with a timeout after 0.5 second:

$  target/debug/lorri internal stream-events
{"Completed":{"nix_file":"/home/symphorien/src/lorri/shell.nix","rooted_output_paths":{"shell_gc_root":"/home/symphorien/.cache/lorri/gc_roots/07d6e569120731e7ee9a3297613c71b7/gc_root/shell_gc_root"}}}
Mar 28 17:36:23.148 ERRO Message(R(Timeout(D(Millis(500)))))

Maybe it always used to timeout but when trying to return the error it would hang as explained above?

symphorien commented 3 years ago

Mmh this leaks file descriptors very fast...

symphorien commented 3 years ago

file descriptors in the server*

Profpatsch commented 3 years ago

Ah, it’s a regression then, async will block until the thread inside finishes, since everything else is just sloppy.

Profpatsch commented 3 years ago

@symphorien I think the solution here is to allow Async to not require a thread to join itself when it goes out of scope. There is another solution, but that requires a bunch of backwards-pointing shutdown channels and I don’t think it’s worth the overhead.

Profpatsch commented 3 years ago

Okay, I have a fix.