Closed kxt closed 3 years ago
Currently there is a workaround in Pty
calling task::sleep
to slow it down.
However, as there is no right amount of sleep that works for every machine, the proper solution would be to have back pressure between Pty
and Screen
, so if Screen
can't keep up, Pty
will stop reading from the program until the queue clears up. This'd need a bounded queue between Pty
and Screen
, however, as Pty
is async, we can't simply replace the queue with an mpsc SyncSend
, as it is blocking. This means an async queue is needed, like async-std's bounded channel.
I made a proof-of-concept using a bounded to_screen
queue, and I ran into a deadlock:
pty
thread processes a lot of input, filling up to_screen
queue as soon as screen
thread pops something from it.screen
thread pops a ScreenInstruction::Render from to_screen
, and processes it with Screen::render
, which calls Tab::render
, which calls Pane::render
. pane
happens be a PluginPane
, and it's render
opens a channel to the plugin, sends it a PluginInstruction::Render
and waits for the plugin's response through that channel, blocking screen
thread in the mean time.wasm
thread that would process PluginInstruction::Render
is blocked: it still haven't finished processing a previous PluginInstruction::Update
message. PluginInstruction::Update
involves sending an ScreenInstruction::Render
to the to_screen
queue, which is already filled by pty
thread, so this send is blocking.TL;DR: pty
thread has filled to_screen
, wasm
thread is blocked by the filled to_screen
, and screen
thread (which should clear up to_screen
) is blocked by the blocked wasm
thread.
So simply changing to_screen
to a bounded queue is not a solution.
I found a workable way to add back pressure between pty
and screen
threads without triggering the mentioned deadlock.
The idea is to have a dedicated bounded channel between pty
and screen
, so we gain back pressure, while keeping the unbounded channel between every other component and screen
, to avoid deadlocks.
This required a change from mpsc
to crossbeam
, as mpsc lacks a way to select between multiple Receivers.
@imsnif If this is an acceptable approach (poc patch here) I'll whip up a PR-able implementation.
@kxt - at a brief glance this looks like a good direction (though I haven't run it or looked too deeply at it yet). I'm good with moving forward with a PR so we can look deeper. I might want to bring in @kunalmohan to also take a look (if he has the time) since he's done a lot of work in this area.
@kxt great work in the #523! The throughput has increased by a LOT!
Regarding this issue, I personally didn't notice this issue in release builds (on main as well as in #523), but it is quite evident in the debug builds. Your proposal of using crossbeam
channels with select
feature looks like a very good solution. Since it might be some time before the rendering throughput is increased, I think we should go ahead with this. So go ahead and draft a PR and we'll discuss this further :)
I've created two PRs. The first one contains some refactors preparing for the fix, including switching from mpsc
to crossbeam
, they are ready for review and merge.
The second draft PR contains the actual fix, seems to work OK. However, some of the testcases fail unless sleep
was introduced at random points, which is not great. I wasn't able to figure out why it is necessary.
I've noticed that many testcases already contain sleep
s, and there is are also a generous amount of sleep in read_from_stdin
, which makes running the testcases slow.
Can someone help me figuring out what's going on with the testcases? @kunalmohan ?
Hey @kxt - I added a review in the first PR. About the tests in the draft PR (which I think you didn't yet create? unless I missed it?): our testing infrastructure is one of the most broken parts of the code-base. It was one of the first pieces of Zellij and is very long overdue for an overhaul.
Since it works by comparing snapshots of the app state (essentially what's on screen in certain situations) and the app includes some asynchronous elements, instead of creating a proper synchronization we just kept adding sleeps to make it work properly. I am not proud of this, but I am honest about it. :) I plan on devoting attention to it very soon after all the other urgent stuff.
So I'm afraid the best answer I can give you now is to add sleeps where necessary to wait for data to come from the fake pty before sending the next command. (also, there might still be some atomicity issues left where stdin is concerned, though I think we got most of those already).
If you want further specific help with this, let me know.
Sorry, I did not link the draft PR properly properly, it's in my fork of zellij right now.
I figured that it's because of the lots of cross-thread communication makes it hard to figure out if something has already happened or not.
If you say sleep
s are ok as of now, then I'll open a proper PR with those two patches.
Do you have any concrete plans on the test overhaul?
If you say
sleep
s are ok as of now, then I'll open a proper PR with those two patches.
Sounds good.
Do you have any concrete plans on the test overhaul?
An idea we've been throwing around the team is to move whatever makes sense in the integration tests to be unit tests, and make the rest be proper end-to-end tests. Our idea is to spin up a docker container and communicate with it through ssh, thus simulating the full app cycle. This still doesn't solve the synchronization problem though, but I have a feeling that's the sort of issue you solve as you work on it.
If you have any other ideas, they would of course be most welcome.
version
0.12.0
Reproduction
Run
time cat 1_000_000_lines.txt
and compare it with the time measured by wall clock. The amount of timetime
reports is significantly less than what is observed. Having a slow computer and using non-optimized (non-release
) build helps.Root cause
A program's output is being processed by the
Pty
thread. This thread reads the output and propagates it to theScreen
thread for rendering through an unbounded channel calledto_screen
. If the program produces output faster than whatScreen
can render,to_screen
's queue starts to grow indefinitely. This results in growing memory usage and input latency.An alternative explanation (kudos to
@aeinstein
): running nearly at the speed of light,cat
is experiencing time dilation compared to a stationary observer.As PR #523 increases the throughput of
Pty
thread, it exacerbates this issue.