hartwork commented 10 months ago

Hi!

The current signal handling code is not robust and undefined behavior (according to https://wiki.sei.cmu.edu/confluence/display/c/SIG31-C.+Do+not+access+shared+objects+in+signal+handlers).

There are many ways to address this problem, and I'd like to present one approach here for review.

I saw only after coding this that use of pthreads was said to be not welcome in #62 to support systems without pthreads. If that's still status quo then the only threadless alternative will likely be to do fully non-blocking stdin I/O and I just learned things while building for the trashcan… Status quo on master really needs some fix, it's not robust.

These decisions went into the current implementation:

Signal handlers only set integer flags of type volatile sig_atomic_t and the actually handling is done elsewhere.
Signal handling or stdin reading need to happen in separate threads now (because stdin reading is blocking and we cannot have signal handling blocked).
I decided to keep all access to Curses and signals on the main thread and move stdin data extraction to a separate thread.
So now we have a reader and a write on a shared resource — the plot data — and hence need some synchronisation to not plot in-between state, I used a mutex for that.
Also, the code sits on top of pull request #97 with minimum window size handling and includes that feature with adaptions.

Let me know what you think, best, Sebastian

Depends on #94 for a fully green CI.

Fixes #62

tenox7 commented 10 months ago

Hi! First of all I'm really grateful for your help and submitting all the PRs fixes etc. However I would like to have a little bit of discussion before merging this. I really do have a personal problem with use of pthreads for ttyplot.

Most of my own use of ttyplot is on legacy/vintage unix systems with real serial terminals / consoles. Some of the systems don't support pthreads. I also wouldn't want to do it in a text mode monitoring app like. It will make it unnecessarily complex compared to the problem at hand. Or I would think. I do understand that there is an undefined behavior in the signal handling. However I have not seen, or experienced this ever causing any issue in the practice.

Longer term, I'm planning to develop a new version of ttyplot in more modern programming language like Go, use termui or something similar. I would rather keep the C version of ttyplot compatible with more vintage unixes.

I also have a proposal for another solution, that I have been considering. I haven't tested it yet. The idea is to have some control character in the input stream to trigger refresh, etc. The character would be pushed to the input stream using ungetc() from a signal handler. Then the main loop would execute this by receiving it from the input stream. I do understand that this is not async safe, however perhaps we could select/poll on stdin. Let me know if this is even more insane.

edgar-bonet commented 10 months ago

AFAIK, slect() and poll() are the standard way of monitoring multiple things in parallel within a single thread, so it looks like a reasonable approach to me. Note, however, that waiting for both a file descriptor and a signal with either of these functions creates a race condition: see the section Combining signal and data events of this select tutorial.

The most elegant way of solving this race is IMO to convert the signal to a file descriptor using signalfd(). This call, however, is Linux-specific, and will likely not be available on your vintage Unixen.

The second best option is to use pselect(), which was standardized in 2000 by POSIX.1g. Is this available on your systems? Failing that, you may use the old self-pipe trick, which seems still cleaner than ungetc(). Or you may just accept the race and mitigate the issue with a timeout.

hartwork commented 10 months ago

Hi @tenox7 and @edgar-bonet, thanks for your replies!

There are many distinct points in your replies, let me address each one on its own and end with the more positive ones regarding a potential way forward:

To see ttyplot at its worst I recommend: feeding /dev/random to stdin, frequent manual window resizes, compiling with UBsan and ASan enabled. You'll see crashes inside curses and signed integer overflows sooner or later. I haven't had time for a closer look at the integer overflows yet.
Regarding complexity: signal handling has inherent complexity. If the choice is embracing that complexity or undefined behavior then the only right answer in my book is complexity: undefined behavior and crashing is just not acceptable to me.
Regarding a re-write:
- I agree that this can be re-written by someone experienced in Go or Rust in a reasonable amount of time, with CI and >90 test coverage hopefully. However C ttyplot is the software included in mayor Linux distros today, so it would be nice to get it to a point of stable before moving on to a re-write for good.
- I'm looking forward to see your re-write in a safer language
- I'd be happy to join an effort in Rust. I'd be happy enough with a 1:1 UX clone of ttyplot that works on modern Linux plus stability, for a start, personally. I'd be happy to contribute e.g. CI.
Regarding mixing stone age systems with modern systems in the same software: Can't old systems use an older version of ttyplot and Git master stops supporting stone age? If ttyplot master is meant to support stone age systems still today I would personally suggest and ask to make that super visible in the readme. I think if had known that from the beginning, it would have changed everything for me and my contributions. I remember when I found ttyplot mentioned on StackOverflow I was amazed and also puzzled that it didn't exist in Gentoo back then. Regarding legacy again, I remember I rejected a contribution of a plan9 mk build system to upstream Expat and it was the right decision to keep the museum out of today's Expat.
I have tried select myself before going for pthreads but to my surprise it said that stdin was always readily readable and so I ditched it as useless earlier for the moment. Maybe I used it wrong, would need a revisit and a closer look. Reading from stdin in a non-blocking way — probably single characters into a buffer to pass that to sscanf when enough — may be a way to save on threads altogether. To be fair that would also add complexity.
Regarding the self-pipe trick, it would also be an option if writing to a pipe in a signal handler (1) is defined behavior — https://wiki.sei.cmu.edu/confluence/display/c/SIG31-C.+Do+not+access+shared+objects+in+signal+handlers doesn't sound like it — and (2) can be made non-blocking (which I'm not sure how that would work given that pipes need a reader before writing).

I'd be happy about any ideas regarding the select-always-ready and the non-blocking pipe writes.

PS: @tenox7 is there a chance to get #97 reviewed in isolation before this one?

edgar-bonet commented 10 months ago

@hartwork wrote:

Regarding the self-pipe trick, it would also be an option if writing to a pipe in a signal handler (1) is defined behavior — https://wiki.sei.cmu.edu/confluence/display/c/SIG31-C.+Do+not+access+shared+objects+in+signal+handlers doesn't sound like it

It looks like defined behavior to me. The pipe is not a shared resource: it is a kernel object. Instead, each end of the pipe (a file descriptor) is an individual resource. The signal handler would be the only code to access the input end of the pipe. Also POSIX says write() can be safely called from a signal handler.

and (2) can be made non-blocking (which I'm not sure how that would work given that pipes need a reader before writing).

IIRC, a file descriptor can be set to non-blocking mode.

hartwork commented 10 months ago

@hartwork wrote: It looks like defined behavior to me. The pipe is not a shared resource: it is a kernel object. Instead, each end of the pipe (a file descriptor) is an individual resource. The signal handler would be the only code to access the input end of the pipe. Also POSIX says write() can be safely called from a signal handler.

@edgar-bonet interesting — thanks!

and (2) can be made non-blocking (which I'm not sure how that would work given that pipes need a reader before writing).

IIRC, a file descriptor can be set to non-blocking mode.

Yes! I'll play with that and wait for a reply from @tenox7 on approaches he'd accept, not yet sure in which order, will see.

tenox7 commented 10 months ago

97 merged, thank yo!

Now that ttyplot is included in so many distros (thank you all) I don't want to impede development for modern systems. People can use older versions of ttyplot on older OSes. However I still think that pthreads is an overkill for a very simple text mode app like that. I would appreciate if you could try implementing/testing the self-pipe trick. If it works and it's stable, I would prefer it to pthreads. If not, I will accept this. Thank you!

edgar-bonet commented 10 months ago

@tenox7 wrote:

I would appreciate if you could try implementing/testing the self-pipe trick.

The nice thing about this trick is that it works on ancient Unices. If, going forward, ttyplot is dropping support of these legacy systems, then I suggest we use pselect() instead. This function is in POSIX since 2000, and is meant to solve the very same race addressed by the self-pipe trick... in a less hacky way.

hartwork commented 10 months ago

I'm at it but it takes more time, stay tuned :smiley:

hartwork commented 10 months ago

@tenox7 @edgar-bonet I have replaced this with a version without threads now, even without select, without self-pipes and without any races that I would already know about. You'll see that the complexity that we saved in threads is now in the non-blocking I/O code, because we need to collect the input to sscanf and that's not trivial, as you'll see. I'll convert this to a draft now just so that it doesn't get merged before we all had a good chance to test and maybe sleep over this approach more also. I'm looking forward to merciless but constructive review, I'm very curious what you think, this has been an interesting ride :smiley:

tenox7 commented 10 months ago

Looks cool, but could you describe in code comments, or here why is this necessary? I mean finding rows/eding/pushing back data etc? Is it to do with non blocking stdin?

edgar-bonet commented 10 months ago

This version never sleeps! At least not until stdin gets closed: it busy-waits for data, keeping one CPU core at 100%.

A pselect()-based main loop would less wasteful of CPU and energy.

hartwork commented 10 months ago

Looks cool, but could you describe in code comments, or here why is this necessary? I mean finding rows/eding/pushing back data etc? Is it to do with non blocking stdin?

@tenox7 the short answer would be that we need to determine how much text to feed to sscanf some way. I'll need to sleep over this.

This version never sleeps! At least not until stdin gets closed: it busy-waits for data, keeping one CPU core at 100%.

A pselect()-based main loop would less wasteful of CPU and energy.

@edgar-bonet that's an important point, will adjust, thanks! :+1:

edgar-bonet commented 10 months ago

If that helps, I wrote a toy program to see how we could watch for SIGWINCH, SIGINT and stdin readiness with a pselect() loop:

Source code

```c #include #include #include #include #include static volatile sig_atomic_t got_sigwinch = 0; static volatile sig_atomic_t got_sigint = 0; int reading_input = 1; // flag: are we still waiting for input data? static void signal_handler(int sig) { switch (sig) { case SIGWINCH: got_sigwinch = 1; break; case SIGINT: got_sigint = 1; break; } } // Dummy resize handler: print a count of the resize events. static void handle_resize() { static int resize_count = 0; printf("resize event # %d\n", ++resize_count); } // Dummy input handler: print a count of the input events. static void handle_input() { static int read_count = 0; char buffer[1024]; int count = read(STDIN_FILENO, buffer, sizeof buffer); if (count == -1) { perror("stdin"); return; } printf("read event # %d\n", ++read_count); if (count == 0) { printf("reached end of input\n"); reading_input = 0; } } int main(void) { // Get ready to handle signals. sigset_t sigmask, empty_sigmask; sigemptyset(&empty_sigmask); sigemptyset(&sigmask); sigaddset(&sigmask, SIGWINCH); sigaddset(&sigmask, SIGINT); sigprocmask(SIG_BLOCK, &sigmask, NULL); signal(SIGWINCH, signal_handler); signal(SIGINT, signal_handler); // Prepare an fd_set with stdin only. fd_set stdin_set; FD_ZERO(&stdin_set); FD_SET(STDIN_FILENO, &stdin_set); // Main loop. for (;;) { // Wait for either input or a signal. int nfds; fd_set read_set, *readfds; if (reading_input) { read_set = stdin_set; readfds = &read_set; nfds = STDIN_FILENO + 1; } else { readfds = NULL; nfds = 0; } int retval = pselect(nfds, readfds, NULL, NULL, NULL, &empty_sigmask); // Handle whatever events we caught. if (reading_input && retval != -1 && FD_ISSET(STDIN_FILENO, &read_set)) handle_input(); if (got_sigwinch) { got_sigwinch = 0; handle_resize(); } if (got_sigint) { printf("\rbye\n"); break; } } } ```

That could be the skeleton of the program. In fact, the main loop could almost be used as-is, provided the dummy handle_resize() and handle_input() are replaced with the actual useful stuff.

hartwork commented 10 months ago

Thanks @edgar-bonet, this looks familar but it's doing blocking reads from stdin once pselect found a single byte, correct? I wonder if blocking read and/or blocking scanf are really lossless despite being interrupted by signals — maybe I'm too pessimistic :smiley: If they are, maybe I can drop the whole non-blocking complexity again. What is your understanding?

edgar-bonet commented 10 months ago

@hartwork wrote:

it's doing blocking reads from stdin once pselect found a single byte, correct?

This code reads from stdin only when pselect() says it is ready for reading. According to the select(2) manual (emphasis mine):

A file descriptor is ready for reading if a read operation will not block

Note that read() will only read as many bytes as are currently available on the pipe. Note also that, if you call read() twice, the second call will most likely block.

I wonder if blocking read and/or blocking scanf are really lossless despite being interrupted by signals

read() will not be interrupted: signals are blocked all the time except within the pselect() call.

Note that you cannot use scanf(), as this function calls read() as many times as needed to get the requested data, and only the first call is guaranteed to not block.

maybe I'm too pessimistic 😃

You may fcntl() stdin into non-blocking mode if you fear pslect() may fail to honour its contract.

maybe I can drop the whole non-blocking complexity again.

My understanding is that this complexity is mostly a state machine meant to read stdin by small bites, and tell when a data point is ready to be parsed. This would still be useful with a pselect() approach, precisely because we can only do a single read() at a time.

Wait! I noticed my example code misbehaves when stdin is closed. I'll go fix that...

edgar-bonet commented 10 months ago

I wrote:

I noticed my example code misbehaves when stdin is closed. I'll go fix that...

Fixed by the use of a reading_input flag. I edited the code in my previous comment.

hartwork commented 10 months ago

it's doing blocking reads from stdin once pselect found a single byte, correct?

This code reads from stdin only when pselect() says it is ready for reading. According to the select(2) manual (emphasis mine):

A file descriptor is ready for reading if a read operation will not block

Note that read() will only read as many bytes as are currently available on the pipe. Note also that, if you call read() twice, the second call will most likely block.

@edgar-bonet good points!

I wonder if blocking read and/or blocking scanf are really lossless despite being interrupted by signals

read() will not be interrupted: signals are blocked all the time except within the pselect() call.

I need to think more about that.

Note that you cannot use scanf(), as this function calls read() as many times as needed to get the requested data, and only the first call is guaranteed to not block.

Good point!

maybe I'm too pessimistic 😃

You may fcntl() stdin into non-blocking mode if you fear pslect() may fail to honour its contract.

Maybe let's trying blocking plus pselect first.

maybe I can drop the whole non-blocking complexity again.

My understanding is that this complexity is mostly a state machine meant to read stdin by small bites, and tell when a data point is ready to be parsed. This would still be useful with a pselect() approach, precisely because we can only do a single read() at a time.

Sounds good!

I noticed my example code misbehaves when stdin is closed. I'll go fix that...

Fixed by the use of a reading_input flag. I edited the code in my previous comment.

Thanks!

I'll go work on the next iteration…

edgar-bonet commented 10 months ago

I have been considering this non-blocking approach since yesterday. Here are some extra thoughts:

About the signal handling

You may think of the main loop as an event loop. It is structured like this pseudo-code:

for (;;) {
    wait_for_next_event();

    if (got_input)    handle_input();
    if (got_sigwinch) handle_sigwinch();
    if (got_sigint)   handle_sigint();
}

where wait_for_next_event() is actually pselect(). If a SIGWINCH or SIGINT arrives while we are sleeping on pselect(), the signal is delivered, the handler runs, and pselect() returns immediately. This is how we catch those events. If the signal arrives at any other time, it cannot be delivered (as it has been blocked by sigprocmask()): it is therefore a pending signal. On the next loop iteration, as pselect() unblocks the signal, it gets delivered immediately, and we catch it as before. Having these signals delivered only within pselect() makes the loop structure simpler and easier to reason about. It also prevents the race condition I mentioned before.

About the state machine

The data producer is unlikely to write() to the pipe one byte at a time: it is far more likely that at least one data point will be provided each time. We then do not need to optimize for the one-byte-at-a-time scenario. We may drop the state machine and simply attempt to sscanf() after each read(). If sscanf() fails, no worries: we keep in the buffer the bytes we read so far, and return to the event loop – we will sscanf() again after our next read().

On quasi-pseudo-code (omitted: error checking, edge cases, variable declarations...):

static void handle_input()
{
    static char buffer[1024];  // important that it's static
    static int buffer_pos;
    read(STDIN_FILENO, buffer + buffer_pos, room_available);
    buffer_pos += bytes_read;
    buffer[buffer_pos] = '\0';
    for (;;) {
        if (channels_to_plot == 2)
            matched = sscanf(buffer, "%lf %lf[ \t\r\n]%n",
                    &value1, &value2, &bytes_consumed)
        else
            matched = sscanf(buffer, "%lf[ \t\r\n]%n",
                    &value1, &bytes_consumed)
        if (matched < expected) break;  // sscanf() failed to read the expected data
        save_values_to_ring_buffers();
        shift_the_input_buffer(bytes_consumed);
        got_at_least_one_data_point = true;
    }
    if (got_at_least_one_data_point) paint_plot();
}

Note that the sscanf() format ends with white space: this is meant to guarantee that we got the full numeric value.

hartwork commented 10 months ago

The data producer is unlikely to write() to the pipe one byte at a time: it is far more likely that at least one data point will be provided each time.

@edgar-bonet I think that's missing the case where you're driving the input while typing into the window, e.g. running ttyplot -2 and then typing 1<space>2<space> into the window. That will not arrive in one piece. EDIT: As a result, bytes_consumed and %n will be at risk to treat input "1.23" as "1.2" and "3" rather than a single double. That would be a problem.

hartwork commented 10 months ago

@edgar-bonet I have pushed my current code now to allow review and testing of the third generation code in detail. The two most important things to say about it:

The scanner is both simpler and more correct now.
I'm using select rather than pselect for the moment, not because I want to avoid pselect but because with a timeout both seem to not stop blocking on signal arrival but wait until timeout for me (which is unexpected) at the moment and I haven't yet figured out why. I need to look into signal masks and blocking more. The current version looks like working to the user, but the signal setup still needs improvement, the rest I am not unhappy about. If you see what I'm missing, would be great to understand, will play with it more myself also.

hartwork commented 10 months ago

@edgar-bonet PS:

I extended my second-reply a bit, new text after "EDIT:"
To see the select bug in action I recommend increasing the timeval timeout to more than a second. That will make the effect easy to see. Still not sure about reasons.

hartwork commented 10 months ago

@edgar-bonet I think I fixed it now: it got interrupted but lacked a redraw and jumped right back into the next select. I'm happy to further improve on any issues you find :pray:

hartwork commented 10 months ago

Note that the sscanf() format ends with white space: this is meant to guarantee that we got the full numeric value.

@edgar-bonet only saw that now. It's a nice idea but:

would it work for cases like echo -n '1 2 3' | ./ttyplot where there is no trailing whitespace?
the trailing whitespace does not seem to be required in practice despite the [..] nonempty wrap, and bytes_consumed always ends up 0 (see below), with or without whitespace present.

$ cat main.c 
#include <stdio.h>

int main(int argc, char ** argv) {
    double d1 = 0.0;
    double d2 = 0.0;
    int bytes_consumed = 0;
    int res = sscanf(argv[1], "%lf %lf[ \t\r\n]%n", &d1, &d2, &bytes_consumed);
    printf("res=%d, d1=%f, d2=%f, bytes_consumed=%d, argv[1]=\"%s\"\n", res, d1, d2, bytes_consumed, argv[1]);
    return 0;
}

$ gcc main.c && ./a.out '1 2'
res=2, d1=1.000000, d2=2.000000, bytes_consumed=0, argv[1]="1 2"

$ gcc main.c && ./a.out '1 2 '
res=2, d1=1.000000, d2=2.000000, bytes_consumed=0, argv[1]="1 2 "

hartwork commented 10 months ago

@edgar-bonet update: I found a way to make %n work but it requires POSIX.1-2008 according to man sscanf:

$ cat main.c 
#include <stdio.h>
#include <stdlib.h>

int main(int argc, char **argv) {
  double d1 = 0.0;
  double d2 = 0.0;
  int consumed1 = 0;
  int consumed2 = 0;
  char *dummy = NULL;
  int res = sscanf(argv[1], "%lf %lf%n%m[ \t\r\n]%n", &d1, &d2, &consumed1,
                   &dummy, &consumed2); // "m" is POSIX.1-2008
  printf("res=%d, d1=%f, d2=%f, consumed1=%d, consumed2=%d, argv[1]=\"%s\"\n",
         res, d1, d2, consumed1, consumed2, argv[1]);
  free(dummy);
  return 0;
}

$ gcc main.c && ./a.out '1 2  '
res=3, d1=1.000000, d2=2.000000, consumed1=3, consumed2=5, argv[1]="1 2  "

edgar-bonet commented 10 months ago

@hartwork:

would it work for cases like echo -n '1 2 3' | ./ttyplot where there is no trailing whitespace?

No, it wouldn't. However, in Unix any valid text stream ends with '\n'.

the trailing whitespace does not seem to be required in practice despite the [..] nonempty wrap, and bytes_consumed always ends up 0 (see below), with or without whitespace present.

My bad. [...] should be replaced by %*[...], and the test for a failed match should be bytes_consumed == 0.

I have pushed my current code now to allow review and testing of the third generation code in detail.

Thanks. I noticed that, if the input stream is closed, this version starts consuming 100% CPU.

I'm using select rather than pselect for the moment, not because I want to avoid pselect but because with a timeout both seem to not stop blocking on signal arrival but wait until timeout for me (which is unexpected)

Weird. The toy program I posted above does not suffer from this syndrome: I added a ten second timeout, and it still responds immediately to SIGWINCH, SIGINT and input events.

the signal setup still needs improvement

It suffers from the race condition I mentioned earlier. If a SIGWINCH is delivered after the test if (sigwinch_pending) (line 424 on commit 53fa0be87aa03c73d2d9fa6848e0893c25a1c49e) but before the call to select() (l. 450), it will not interrupt select(), which can sleep for 0.5 s. Then, signal_received will be false and we won't redraw until the next loop iteration.

hartwork commented 10 months ago

Hi @edgar-bonet, thanks for your reply!

would it work for cases like echo -n '1 2 3' | ./ttyplot where there is no trailing whitespace?

No, it wouldn't. However, in Unix any valid text stream ends with '\n'.

I think I would rather support even say piping in a text file that lacks a trailing new line. I would personally always fix files to end in a trailing newline — pre-commit's end-of-file-fixer is very nice — but I'd rather be supportive and robust about it, no need to lose the last record.

the trailing whitespace does not seem to be required in practice despite the [..] nonempty wrap, and bytes_consumed always ends up 0 (see below), with or without whitespace present.

My bad. [...] should be replaced by %*[...], and the test for a failed match should be bytes_consumed == 0.

Very nice, "%lf %lf%*[ \t\r\n]%n" seems to work as expected, does not require m, does not need to malloc. Over night I figured that with a known maximum input length of 4096 I could have avoided m knowing that %[ \t\r\n] will never store than 4096 bytes in the first place, but not storing at all using %* seems best :+1:

I'll at least try to integrate that approach to see how it feels compared to status quo. I don't like the cryptic past-time feel of scanf but to not have two parts of the code that could potentially go out of sync over time and may already be out of sync in some unknown way seems like a clear win :+1:

I have pushed my current code now to allow review and testing of the third generation code in detail.

Thanks. I noticed that, if the input stream is closed, this version starts consuming 100% CPU.

Interesting! I could use pselect with nfds=0 and the current timeout similar to what you did at https://github.com/tenox7/ttyplot/pull/98#issuecomment-1814500197 for a portable sub-second sleep, as mention under "Emulating usleep(3)" in man select. Without a clock display, just pause would have done. (We'd could also send SIGALARM us just for the clock, but the related code would be >=POSIX.1-2008 again apparently.)

It suffers from the race condition I mentioned earlier. If a SIGWINCH is delivered after the test if (sigwinch_pending) (line 424 on commit 53fa0be) but before the call to select() (l. 450), it will not interrupt select(), which can sleep for 0.5 s. Then, signal_received will be false and we won't redraw until the next loop iteration.

I do not yet see how pselect would perform any different from plain select in that very scenario. Can you help me understand? I'll try add sleeping (only locally for debugging) to give the race a better chance to show and see if pselect does any different, with some luck, but I expect the same for the scenario that you described. I have a feeling I am missing something, the whole race scenario is still a bit unclear (while race conditions in general are clear). The text in the man page seems to be written for readers that already understand the problem rather than some that don't.

I'll go work on the next iteration, happy about more insights from the land of signals from you in the meantime.

hartwork commented 10 months ago

I noticed that, if the input stream is closed, this version starts consuming 100% CPU.

@edgar-bonet ^^ that's fixed now, pushed.

hartwork commented 10 months ago

@edgar-bonet fifth(?) generation pushed and ready for review. It has all known issues fixed except potential use of pselect.

edgar-bonet commented 10 months ago

Without a clock display, just pause would have done.

pause() would be a viable option. However, we would then end up with two separate event loops: one pselect()-based for when there is data, and a second one, pause()-based, for when the stream is closed. This would add unneeded complexity to the structure of the program. In contrast, a single event loop is a standard programming idiom simple to understand: repeat (forever) { catch an event; handle the event; }.

Regarding the clock display: On master, it is only updated on data or SIGWINCH events. Adding a periodic update is a nice extra feature, but wouldn't that be off-topic on this PR?

I do not yet see how pselect would perform any different from plain select in that very scenario.

See the toy program above: SIGWINCH and SIGINT are blocked during the whole execution of the program except inside the call to pselect(). This ensures that these signals are always delivered to (and caught by) pselect(). All events (signals and data) are then collected at a single point in the program. Plain select() cannot do that other than with the self-pipe trick.

hartwork commented 10 months ago

Without a clock display, just pause would have done.

pause() would be a viable option. However, we would then end up with two separate event loops: one pselect()-based for when there is data, and a second one, pause()-based, for when the stream is closed. This would add unneeded complexity to the structure of the program. In contrast, a single event loop is a standard programming idiom simple to understand: repeat (forever) { catch an event; handle the event; }.

@edgar-bonet if it was something like…

if (stdin_is_open)
    ....... = pselect(...);
else
    pause();

…then it would still feel like a single loop, it could even be extracted into a function. Just to add what seemed missing in the picture, I think a question of personal preference maybe, both should be fine.

Regarding the clock display: On master, it is only updated on data or SIGWINCH events. Adding a periodic update is a nice extra feature, but wouldn't that be off-topic on this PR?

I could re-add that bug to the pull request and then make a follow-up pull request only to unbreak the breakage. I understand that this pull request could be multiple pull requests now but the pieces are heavily interleaved, splitting them up will be quote some work and risks breaking things that are finally working, and our current merge frequency will also make it take way longer to get it merged with all the ping pong involved.

I do not yet see how pselect would perform any different from plain select in that very scenario.

See the toy program above: SIGWINCH and SIGINT are blocked during the whole execution of the program except inside the call to pselect(). This ensures that these signals are always delivered to (and caught by) pselect(). All events (signals and data) are then collected at a single point in the program.

Understood. Using pselect now, pushed.

Plain select() cannot do that other than with the self-pipe trick.

That I do not understand. If we used select like this:

while (true) {
    if (sigint_pending) {
        break;
    }

    ...

    pthread_sigmask(SIG_SETMASK, &empty_mask, &origmask);
    ready = select(nfds, &readfds, NULL, NULL, &timeout);
    pthread_sigmask(SIG_SETMASK, &empty_mask, NULL);

    if ((ready == -1)) && (errno == EINTR)) {
        continue;
    }

    ...
}

Where would the race be and what would would be the effect of that race? Thanks in advance!

hartwork commented 10 months ago

@tenox7 I believe this code is ready for your review and testing and checking against your requirements for a merge now. Thanks in advance!

edgar-bonet commented 10 months ago

if it was something like…

if (stdin_is_open)
    ....... = pselect(...);
else
    pause();

…then you would have a race if the signal is delivered right before pause().

[Regarding the clock display] the pieces are heavily interleaved

I didn't notice they were heavily interleaved. If they indeed are, then it does make sense to keep them together.

If we used select like this: [...] Where would the race be and what would be the effect of that race?

while (true) {
    if (sigint_pending) {
        break;
    }
    /* === race here === */
    pthread_sigmask(SIG_SETMASK, &empty_mask, &origmask);
    ready = select(nfds, &readfds, NULL, NULL, &timeout);
    pthread_sigmask(SIG_SETMASK, &origmask, NULL);
    if ((ready == -1)) && (errno == EINTR)) {
        continue;
    }
    ...
}

Imagine SIGINT arrives at the place marked race here. As it is presumably blocked, it will be a pending signal at this point. The first call to pthread_sigmask() unblocks the signal, which is then delivered right away (before pthread_sigmask() even returns). Then, select() will not be interrupted by the signal (which has already been delivered), and it may sleep for a long time, limited only by the timeout. It will not return EINTR.

hartwork commented 10 months ago

while (true) {
    if (sigint_pending) {
        break;
    }
    /* === race here === */
    pthread_sigmask(SIG_SETMASK, &empty_mask, &origmask);
    ready = select(nfds, &readfds, NULL, NULL, &timeout);
    pthread_sigmask(SIG_SETMASK, &origmask, NULL);
    if ((ready == -1)) && (errno == EINTR)) {
        continue;
    }
    ...
}
Imagine SIGINT arrives at the place marked race here. As it is presumably blocked, it will be a pending signal at this point. The first call to pthread_sigmask() unblocks the signal, which is then delivered right away (before pthread_sigmask() even returns). Then, select() will not be interrupted by the signal (which has already been delivered), and it may sleep for a long time, limited only by the timeout. It will not return EINTR.

@edgar-bonet so with plain select it takes one more round of waiting before we hit if (sigint_pending) { again whereas pselect in contrast would return right away and let us get to the if (sigint_pending) { again quicker. Okay, thanks! :pray:

edgar-bonet commented 10 months ago

The code looks sane to me, free of race conditions. I would just suggest a more straightforward main loop structure.

Regarding the signal handling, as I read it, the global structure of the main loop looks like this:

while (1) {
    if (got_sigint)   handle_sigint();
    if (got_sigwinch) handle_sigwinch();

    wait_for_next_event();

    if (got_either_signal) continue;
    if (got_input)    handle_input();
}

This should work just fine. However, there is no benefit in making it more complicated than it needs to be. If you move the signal handling just below pselect(), you end up with a more straightforward structure:

while (1) {
    wait_for_next_event();

    if (got_sigint)   handle_sigint();
    if (got_sigwinch) handle_sigwinch();
    if (got_input)    handle_input();
}

Regarding the data handling, the structure looks like this (signal handling omitted):

while (1) {
    if (!could_parse_a_data_point) wait_for_next_event();

    if (got_input) read_input();
    try_to_parse_one_data_point();
}

Again, this should work fine, but I find it a bit convoluted: if a single read() provides multiple data points, they will be parsed on multiple iterations of the main loop. Is there a reason for proceeding this way? IMO, it would be more straightforward to fully handle the input event on the current loop iteration:

while (1) {
    wait_for_next_event();

    if (got_input) {
        read_input();
        do {
            try_to_parse_one_data_point();
        } while (could_parse_a_data_point);
    }
}

hartwork commented 10 months ago

The code looks sane to me, free of race conditions.

@edgar-bonet cool, thanks for the review! :pray:

I would just suggest a more straightforward main loop structure. [..]

These are both good ideas. For the second I should note that I only did not go for it to keep the diff smaller and to not further grow the pull request.

Right now, I have no idea how @tenox7 feels about the current approach, I hope he likes it. If this however would get rejected by him for something else, my time refactoring may turn out wasted, it's never just five minutes and it needs good care to not introduce new breakage, so I'd like to hear back from him first, to not put more time on a uncertain line. I'm happy to get back to these ideas, either here or in follow-up pull requests.

tenox7 commented 10 months ago

I think I like it better than pthreads. It adds a lot of complexity, but perhaps is is unavoidable at this point?

edgar-bonet commented 10 months ago

@tenox7: This could eventually be refactored to get the complexity more manageable. Just as PR #106 gets 24 lines of code out of the main loop, the handing and parsing of input (a large part of this PR) could well be moved into its own function(s). Maybe even simplified a bit (not sure, didn't try). If this gets merged, I am willing to attempt such refactoring.

tenox7 commented 10 months ago

@edgar-bonet @hartwork first of all thanks for all your contributions!

With all these changes, I'm thinking that, perhaps instead of merging this to master one by one, lets create some experimental or 2.0 branch or something like that. Then PR everything there instead of master. Once everyone is happy and well tested, we can merge to master as a final step.

edgar-bonet commented 10 months ago

@tenox7 wrote:

[...] lets create some experimental or 2.0 branch or something like that. Then PR everything there instead of master [...]

Seems fine with me. I think this PR, as it is the largest one, should be the first one to be rebased/retargeted to that new branch. I will then rebase and retarget mine.

hartwork commented 10 months ago

@edgar-bonet @hartwork first of all thanks for all your contributions!

@tenox7 you're welcome!

I think I like it better than pthreads. It adds a lot of complexity, but perhaps is is unavoidable at this point?

I see two main sources for the increase in complexity, caused by two key assumptions/decisions/requirements. The key assumptions are:

We wanted to avoid pthreads (or threads in general).
We wanted to get away from scanf because:
- Without signals it may be blocking indefinitely (which is a problem to healthy clock display).
- It may lose data when interrupted due to multiple internal calls to read, I believe.

On a side note quoting man scanf:

It is very difficult to use these functions correctly, and it is preferable to read entire lines with fgets(3) or getline(3) and parse them later with sscanf(3) or more specialized functions such as strtol(3).

As a consequence of the former — no threads —, a single thread has to do all of these things:

handle signals (safely and with near-shortest delay possible)
read incoming data and process it (also with little delay)
animate the clock (at least once a second)

As a consequence of the latter — no scanf —, we now have sscanf (with two "s"), non-blocking I/O with read and pselect, a line buffer to manage parsing input, and code to jump over input that would stuck sscanf otherwise.

Personally, I feel like going simpler would give up something of value and that we have now arrived at something good enough to merge.

With all these changes, I'm thinking that, perhaps instead of merging this to master one by one, lets create some experimental or 2.0 branch or something like that. Then PR everything there instead of master. Once everyone is happy and well tested, we can merge to master as a final step.

I would support this idea if we had more big risky features still in the pipeline, bigger display changes, API incompatible changes — things like that. But what we have here is now complete in itself (besides potential refactoring follow-ups), may not be noticed by most users even, and is just one in a group of independent changes, rather than a big story, other than increasing robustness maybe. Personally, I like to keep master in releasable shape and branches short-lived: merging this branch keeps master releasable in my view. If it does feel too risky to merge still, let's maybe sleep over it and try to identify that feeling of risk and what it is about. Then we can do something about it if necessary. @tenox7 what do you think?

edgar-bonet commented 10 months ago

@tenox7: If you want to optimize the testing effort, I could build a branch that merges all the currently-pending pull requests. Would that help?

edgar-bonet commented 10 months ago

I wrote:

I could build a branch that merges all the currently-pending pull requests.

Well, I just did it: edgar-bonet:next.

$ git -P log --first-parent master..next
commit 2c69fb2f09ff836d07657a5bee5500fb524505c2 (HEAD -> next)
Merge: 333dd6a 98ae048
Author: Edgar Bonet <linux@edgar-bonet.org>
Date:   Thu Nov 23 23:04:27 2023 +0100

    Merge pull request #99 from edgar-bonet/multibyte

    * Define _XOPEN_SOURCE_EXTENDED on macOS
    * Always use ASCII characters for the arrow tips
    * Rely on pkg-config to find the proper ncurses.h
    * Support plotting with mutibyte characters

commit 333dd6ab507d350a984ee16fed2744d3d3142a00
Merge: c72e7d5 2176376
Author: Edgar Bonet <linux@edgar-bonet.org>
Date:   Thu Nov 23 22:55:33 2023 +0100

    Merge pull request #106 from edgar-bonet/rate

    * Increase the time resolution in rate mode

commit c72e7d5884a2f262862563209f5884008cd8dfe6
Merge: 25c0ab0 ceb0bda
Author: Edgar Bonet <linux@edgar-bonet.org>
Date:   Thu Nov 23 22:48:42 2023 +0100

    Merge pull request #110 from edgar-bonet/keys

    * Stop watching the terminal if it gets closed
    * Toggle "rate mode" when the user types 'r'
    * Quit when the user types 'q'

commit 25c0ab001f1af2afa7d7a21bd4e4e8c50ca1219b
Merge: 07ce2f9 c150198
Author: Edgar Bonet <linux@edgar-bonet.org>
Date:   Thu Nov 23 22:47:03 2023 +0100

    Merge pull request #98 from hartwork/signal-handling

    * Get signal handling into the land of defined behavior
    * Delay display of stochastics until we have actual data

elfring commented 9 months ago

Thanks for your software adjustments.

tenox7 / ttyplot

Get signal handling into the land of defined behavior (fixes #62) + fix terminal resize issues + fix stuck clock display + fix mis-render for empty input #98

97 merged, thank yo!

About the signal handling

About the state machine