tqdm / tqdm.cpp

C++ port of tqdm
Other
318 stars 29 forks source link

Sketch of threads and signals #8

Open o11c opened 8 years ago

o11c commented 8 years ago

In order to allow updates without user intervention, it is critical to allow refresh within a signal handler. It would also be quite beneficial to allow all operations from multiple threads.

This is quite feasible if you're a little careful. The major limitations of signal handlers is that you're not allowed to call malloc, and you're not allowed to access any variables not marked volatile. Since we want threads, we might as well upgrade to std::atomic, which has a nice API, too.

Some operations need malloc, and thus must be done outside the signal handler:

(On the other hand, we could do all memory allocation using mmap, which is allowed in a signal handler ...)

Operations that can be done in the signal handler:

Each tqdm_sink contains an intrusive (to prevent extra memory allocations) doubly-linked (to allow removal - or perhaps we could use the indirect-pointer trick and singly-link it? I'm actually not sure about how much more effort is needed with the atomics part here - do we need spinlocks (which are problematic in signal handlers since the thread is no longer executing)?) list of active tqdm instances. This are added using the standard atomic CAS trick in a loop. We should also allow adding other sorts of permanent line.

If output is to a pipe(2), then writes of under PIPE_BUF (at least 512 - plenty for a typical terminal line) are guaranteed to be atomic. If we get EAGAIN, we just flag the current tqdm as dirty (replacing the Python implementation's no-implicit-refresh-within-N-seconds logic) and move on. If output is not to a pipe (or if lines are super long), we aren't guaranteed this behavior, but we should optimistically assume it will be so and fall back to a loop if we get a partial write. We will necessarily assume that no one else is writing to our fd.

If output is to a terminal or a TCP socket, we can call TIOCOUTQ to check if the buffer is getting full or not. This might allow even better decisions about whether to attempt a write or just mark it as dirty and wait until the next (SIGALRM or threaded) timed refresh.

Obviously, use of many of these features should be controllable by the user on a per-tqdm_sink basis - and probably also fully-disablable via macros. If we need C++98 compatibility, we'll have to use boost for atomics, threads, etc. We might also want to allow integration in someone else''s event loop, but I haven't thought out all the requirements and implications.

CrazyPython commented 8 years ago

@o11c This all seems bad for performance. Is there a C++ equivalent to the simple, no threading required, python signal handlers? Or maybe we can handle signals only in tqdm.update(). We also don't want to interfere with any existing SIGALRMS.

o11c commented 8 years ago

For the typical case (SIGALRM or disabled auto-updates), no threads will be created by tqdm.

All that we're doing is allowing the user to call the library from any thread that they create.

As far as not interfering - we do have a choice of a couple different signals, but that's why it's an option anyway.

CrazyPython commented 8 years ago

@o11c would it result in a slowdown for cases where nobody uses threads? (also: misclicked there)

CrazyPython commented 8 years ago

@o11c We don't want to sacrifice serial speed for parallel speed - 90% of use cases will be serial.

casperdcl commented 8 years ago

this is interesting: https://github.com/mkj/dropbear/blob/master/progressmeter.c