fix scrot --delay drift

guijan commented 1 year ago

This PR still needs testing, but it's otherwise complete.

It has a workaround for an OpenBSD bug, someone has to check if this bug is present in other operating systems. I can probably bake the logic to check for the bug into the code, but that will have to wait.

Edit: I thought it was possible the other BSDs suffered from the same issue (siginfo_t unimplemented) because they share the same origin, but it's only OpenBSD, so I used a small #ifdef instead of writing the logic to check for the bug

guijan commented 1 year ago

There's the minor detail that if you combine a delay and selecting a rectangular area, this PR changes picking the selection area to happen after the delay, not before as was previously the case. The fact that the code used to go out of its way to make the old behavior happen was really weird to me and so was the behavior itself. Let me know if this edge case change isn't okay.

daltomi commented 1 year ago

El Thu, 02 Feb 2023 19:32:48 -0800 Guilherme Janczak @.***> escribió:

There's the minor detail that if you combine a delay and selecting a rectangular area, this PR changes picking the selection area happen after the delay, not before as was previously the case. The fact that the code used to go out of its way to make the old behavior happen was really weird to me and so was the code itself. Let me know if this change isn't okay.

Look at number 2 of the FAQ. This behavior should stay.

PS: Lately I am very busy, I may not be able to participate much in the project, but as soon as I can help, I will.

N-R-K commented 1 year ago

Can you elaborate a bit on what exactly is being fixed here (and some steps to reproduce) ?

If it's just trying to fix sleep waking up early, then this seems insanely over-complicated for that task. You could instead use nanosleep - which returns the remaining time in case of an early wake up via the rem argument.

N-R-K commented 1 year ago

Okay, I just saw https://github.com/resurrecting-open-source-projects/scrot/issues/215.

The implementation of the --delay option calls sleep() in a loop, time passes between each sleep() call, so it drifts.

First of all, scrot only accepts a delay at seconds granularity (see https://github.com/resurrecting-open-source-projects/scrot/issues/213). Secondly, this isn't a video game where missing the deadline by a couple milisec would result in a potential bad user experience (i.e frame-drops).

Given these information, it's not practical to add over-complicated code for this IMHO.

A more practical approach (that doesn't require hundreds of line of code) would be to:

Use a wrapper around nanosleep that loops in case of early wake up.
Use dprintf to bypass stdio buffering.

This is still not going to be 100% accurate - but once again - we're dealing with granularity of seconds here. Missing the deadline by a couple milisec is really not a big deal for scrot's usercase IMO.

guijan commented 1 year ago

missing the deadline by a couple milisec

There's no guarantee that only "a couple milisec" pass between each sleep() call. It's also fixable.

Can you elaborate a bit on what exactly is being fixed here (and some steps to reproduce) ?

Currently, it fixes the drift calling sleep() in a loop causes, and it also also fixes any stray SIGALRM reducing scrot's delay which I noticed while fixing the former.

Call scrot with a long delay and a countdown in one terminal:

$ scrot -cd 99999

Then send it SIGALRM in a loop from another:

$ while true; do pkill -SIGALRM scrot; done

You will see scrot master count down really fast.

This PR also starts the delay as soon as we know -d was used which is nice for greater accuracy.

Also, I just noticed yet another bug in both this PR and the current scrot master: stdout could block.

N-R-K commented 1 year ago

You will see scrot master count down really fast.

I mentioned this in my post - you can fix this with nanosleep. Here's a patch:

diff --git a/src/scrot.c b/src/scrot.c
index 295a21f..4dc66d1 100644
--- a/src/scrot.c
+++ b/src/scrot.c
@@ -259,24 +259,28 @@ static Imlib_Image scrotGrabAutoselect(void)
     return im;
 }

+static void
+xsleep(int sec)
+{
+    struct timespec delay = { .tv_sec = sec };
+    while (nanosleep(&delay, &delay) < 0 && errno == EINTR);
+}
+
 void scrotDoDelay(void)
 {
     if (opt.delay) {
         if (opt.countdown) {
             int i;

-            printf("Taking shot in %d.. ", opt.delay);
-            fflush(stdout);
-            sleep(1);
+            dprintf(1, "Taking shot in %d.. ", opt.delay);
+            xsleep(1);
             for (i = opt.delay - 1; i > 0; i--) {
-                printf("%d.. ", i);
-                fflush(stdout);
-                sleep(1);
+                dprintf(1, "%d.. ", i);
+                xsleep(1);
             }
-            printf("0.\n");
-            fflush(stdout);
+            dprintf(1, "0.\n");
         } else
-            sleep(opt.delay);
+            xsleep(opt.delay);
     }
 }

starts the delay as soon as we know -d was used which is nice for greater accuracy.

It'll still miss the time it took to get to that point - which can be noticeable if you try to load the binary (the first time) from a really slow HDD (or NFS).

It's also fixable.

Not entirely, there's still going to be inaccuracies due to clock resolution (and the startup time I mentioned above). But even if we assume it's fully fixable - that's besides the point.

My point was that this is adding hundreds of lines of code into a project - so the question of "What benefit the end-user get out of this?" is worth asking. And I don't see how this is going to be beneficial to 99.99999% of the user-base.

In short, I don't think the amount of code added vs benefit provided ratio is anywhere near being worth it.

But that's just my opinion, of course :) If the project maintainer thinks otherwise then go for it.

guijan commented 1 year ago

There are timing issues we can't fix like the time the processor takes to do things or the time between the user pressing enter and main() starting, yes. Those aren't scrot's fault. I'm fixing the ones that are scrot's fault.

The code's semantics in master allow for -d to sleep for an indeterminate amount of time longer than what was requested. Probably, the OS and machine will respond quickly enough. By the same token, access() probably checked the same file as the one we are open()ing later, malloc() probably won't error, stdout probably won't block and cause us to wait even longer than requested, but those are all bugs. I'd rather have a few hundred extra LoC that do the right thing (even if they may have bugs of their own) than to keep the bug.

N-R-K commented 1 year ago

By the same token ...

It depends on the cost. Some of these (like malloc) can be fixed with low/no cost.

I'd rather have a few hundred extra LoC that do the right thing (even if they may have bugs of their own) than to keep the bug.

I believe that's not a pragmatic mindset and does more harm than good in the long run. So we'll just have to agree to disagree on this.

guijan commented 1 year ago

Currently writing an alternative solution that uses clock_nanosleep() if available, pthread_cond_timedwait() otherwise. It's less LoC and no signal handler.

Edit: OS X can't pthread_cond_timedwait() on CLOCK_MONOTONIC... Indeed, it seems the current solution is the one with the correct semantics that runs on the most platforms.

N-R-K commented 1 year ago

Currently writing an alternative solution that uses clock_nanosleep() if available, pthread_cond_timedwait() otherwise. It's less LoC and no signal handler.

Adding threads for this is still far into overkill territory, IMO.

Indeed, it seems the current solution is the one with the correct semantics that runs on the most platforms.

On my system (Linux 6.1.9 & glibc 2.36), trying to use any delay just hangs indefinitely. Attaching gdb into the hanged process shows that it's stuck on poll.

N-R-K commented 1 year ago

This PR also starts the delay as soon as we know -d was used which is nice for greater accuracy.

I think this breaks the -s behavior. The delay should start after the selection has been made, not before.

guijan commented 1 year ago

I think this breaks the -s behavior. The delay should start after the selection has been made, not before.

I noticed that, I thought the time the user takes to create a selection being added to the delay was an error.

trying to use any delay just hangs indefinitely.

One more thing to investigate... Writing this, I learned that most of the POSIX timer APIs are broken and that the APIs that aren't broken are rarely implemented.

guijan commented 1 year ago

Better to just close the PR, and try again when the fixed POSIX APIs are more widely implemented.

N-R-K commented 1 year ago

On my system (Linux 6.1.9 & glibc 2.36), trying to use any delay just hangs indefinitely. Attaching gdb into the hanged process shows that it's stuck on poll.

I'll just leave it out here for archival purposes: The reason for it seems to be this check:

    if (info->si_code != SI_TIMER)
        return; /* Someone gave us a SIGALRM. Ignore it. */

It was always returning early and so nothing was getting written to the pipe.

Gdb print of *info

{si_signo = 14, si_errno = 0, si_code = 128, /* everything else was 0... */

And after grepping around a bit in /usr/include/, 128 seems to be SI_KERNEL. (Usually I wouldn't go looking inside headers and would rather read documentation, but I'm not too interested in this, so this is more of an info-dump).

Also according to the linux-manpage, setitimer seems to be declared obsolete by POSIX.

resurrecting-open-source-projects / scrot

fix scrot --delay drift #218