Redesign threads interface for Nim v2

planetis-m commented 3 years ago

So, I would mostly like to start a discussion about how the next version of Nim will handle one of the fundamentals of multi threading, that is the Thread object and its API. Currently it's behavior is closely tied to the legacy GC mode and breaking the interface is unavoidable to properly support ARC/ORC (see the Isolated data RFC) and also beneficial since we have the opportunity to improve the interface and add more sugar. Stuff that needs to be addressed are but not limited to the following:

Passing data as sink Isolated[T]?
Propagating exceptions to the main thread instead of terminating?
Syntactic sugar for thread creation.
Are threadDestructionHandlers still relevant?
API additions like cooperative interruption tokens, detach, isJoinable, etc
Auto-joinable Thread via its destructor?
...

Opinions?

References:

https://en.cppreference.com/w/cpp/thread/jthread

konsumlamm commented 3 years ago

Passing data as sink Isolated[T]?

Yes, please. For what it's worth, Rust also basically does this via move semantics.

Syntactic sugar for thread creation.

I'm not sure this is needed, one can just pass a lambda to createThread already:

import std/sugar

var t: Thread[void]
t.createThread(() => echo "doing work")

But I wouldn't complain about some more sugar, something like:

let t = thread:
  echo "doing work"

One thing I really dislike about the current API though is that createThread takes a var Thread instead of just returning a Thread. I tried to add a wrapper for that, but ran into https://github.com/nim-lang/Nim/issues/17136 (a template would probably work, but this made me really unconfident about threads in Nim, in general, so I just ignored them). I would hope that something like this gets fixed by a redesign, but I'm not sure what causes it in the first place.

More references:

Araq commented 3 years ago

IMO the mistake in Thread[T] is the T generic parameter. A thread should be the most bare-bones low level wrapper over Windows/Posix that we can create and the callback it takes should take a single pointer parameter instead. This does not need syntax sugar at all, it's a low level API. You should use spawn instead and spawn should be based on std / tasks.

threadDestructionHandlers should be removed.

  proc pinToCpu*[Arg](t: var Thread[Arg]; cpu: Natural) =
    {.hint: "cannot change Genode thread CPU affinity after initialization".}

is a good indicator that CPU pinning should be part of createThread directly.

Varriount commented 3 years ago

One characteristic with spawn is that it does not support the idea of more than one thread pool, which I could see being a challenge for certain use-cases.

Araq commented 3 years ago

Well spawn could take the threadpool as an argument and for "structured concurrency" that would be required anyway. Or you use moduleA.spawn vs moduleB.spawn. Pretty simple problem.

Clonkk commented 3 years ago

Just my 2 cents on the threading story.

Recurring pain point of threads I've encountered :

Generic thread makes a thread object specialized for a given proc for no reason
Generic thread forces you to use a Tuple argument
Being able to join a thread from within is missing and very practical
Cancel and/or pause a thread execution is also very practical

With std/threadpool, I found that FlowVar were less practical than Future. For reference there's an excellent threadpool implementation based on Future https://github.com/yglukhov/asyncthreadpool/

Also missing - but that's loosely related - is the ability to generate openMP parallel for loop and notably to collapse nested for loop.

planetis-m commented 3 years ago

Another useful define that's missing and gets duplicated is CacheLineSize similar to http://www.hellenico.gr/cpp/w/cpp/thread/hardware_destructive_interference_size.html If someone can point me to where are those constants defined, since I can't find them, I can try making a PR.

mratsim commented 2 years ago

  proc pinToCpu*[Arg](t: var Thread[Arg]; cpu: Natural) =
    {.hint: "cannot change Genode thread CPU affinity after initialization".}
is a good indicator that CPU pinning should be part of createThread directly.

I have removed pinning for taskpools, there are too many cases and complexities:

It reduces the number of context switches / CPU cache flushes and reloads. But performance gains are only visible on memory-bound workloads that maximize cache utilization, usually in scientific computing. In that case people who wants max performance need to deal with detecting hyperthreading siblings core, NUMA / multi-socket or might just use a GPU.
It can't be used on ARM, due to BigLittle arch with a powerful core and efficient cores, that would prevent the OS for migrating the program to a more suitable core when the workload changes.
It can't be used naively with the new Intel Alder Lake CPUs for the same reason
It can't be used onMacOS due to the lack of API (or documentation about them). I hunted them down when writing Weave, to no avail.
Multiple instances of a program like Nimbus would get their main thread pinned on the same core. (Why? Maybe when they are launched from the same bash script?)

Due to all these reasons, I wouldn't attempt to do CPU pinning in the standard library.

Cancel and/or pause a thread execution is also very practical

That should be the responsibility of the event loop running on that thread.

Pausing/cancellation is preemptive multithreading and is kernel domain (well you we use signals like Java garbage-collector ...).

Pthread doesn't expose a suspending API anyway (https://pubs.opengroup.org/onlinepubs/7908799/xsh/pthread.h.html) and their cancellation is cooperative: https://pubs.opengroup.org/onlinepubs/7908799/xsh/pthread_cancel.html

So devs should embrace cooperative scheduling and have multithreaded functions communicate by channels if synchronization is needed. This would also make cancellation points explicit which would be way easier to understand what cleanup is needed. (A cancellation channel can just be an ptr[Atomic[bool]])

Clonkk commented 2 years ago

Pausing/cancellation is preemptive multithreading and is kernel domain (well you we use signals like Java garbage-collector ...).

Well pausing should be in an event loop based around conditional variable - except the ones in std/locks are currently very basic if compared to its C++ equivalent, so it should be improved to improve multithreading. Ability to pause a thread can be interpreted as having the tools in the stdlib to implement event loops without too much friction; you could even imagine to have some simple event loops exposed as both example and to simplify trivial use case.

For cancelling, I just think having the stdlib Thread equivalent of pthread_cleanup_push, pthread_cancel, pthread_set_cancel_state without needing to call posix function is enough .

Araq commented 2 years ago

I have removed pinning for taskpools, there are too many cases and complexities: ...

The API could always ignore the request if the underlying platform doesn't support it. But it doesn't seem to be worth it, it seems the idea didn't age too well.

mratsim commented 2 years ago

Pausing/cancellation is preemptive multithreading and is kernel domain (well you we use signals like Java garbage-collector ...).

Well pausing should be in an event loop based around conditional variable - except the ones in std/locks are currently very basic if compared to its C++ equivalent, so it should be improved to improve multithreading. Ability to pause a thread can be interpreted as having the tools in the stdlib to implement event loops without too much friction; you could even imagine to have some simple event loops exposed as both example and to simplify trivial use case.

After https://github.com/nim-lang/Nim/pull/17711/files, it only lacks waiting with timeout which can be done in a PR.

My main problem to write runtime is the lack of a barrier so that after threads are started I can make sure they are all synchronized before they wreck havoc.

My main issue is that barriers are an optional pthread API and MacOS doesn't provide them ...

Barriers for all OS are here: https://github.com/status-im/nim-taskpools/tree/b31b891/taskpools/primitives

For cancelling, I just think having the stdlib Thread equivalent of pthread_cleanup_push, pthread_cancel, pthread_set_cancel_state without needing to call posix function is enough .

I wouldn't expose them because I don't see a use-case where there aren't a better existing alternative.

For instance the doc says:

A thread's cancellation type, determined by pthread_setcanceltype(3), may be either asynchronous or deferred (the default for new threads). Asynchronous cancelability means that the thread can be canceled at any time (usually immediately, but the system does not guarantee this). Deferred cancelability means that cancellation will be delayed until the thread next calls a function that is a cancellation point. A list of functions that are or may be cancellation points is provided in pthreads(7).

The pthread doc

Cancellation points POSIX.1 specifies that certain functions must, and certain other functions may, be cancellation points. If a thread is cancelable, its cancelability type is deferred, and a cancellation request is pending for the thread, then the thread is canceled when it calls a function that is a cancellation point.
  The following functions are required to be cancellation points by
  POSIX.1-2001 and/or POSIX.1-2008:

      accept()
      aio_suspend()
      clock_nanosleep()
      close()
      connect()
      creat()
      fcntl() F_SETLKW
      fdatasync()
      fsync()
      getmsg()
      getpmsg()
      lockf() F_LOCK
      mq_receive()
      mq_send()
      mq_timedreceive()
      mq_timedsend()
      msgrcv()
      msgsnd()
      msync()
      nanosleep()
      open()
      openat() [Added in POSIX.1-2008]
      pause()
      poll()
      pread()
      pselect()
      pthread_cond_timedwait()
      pthread_cond_wait()
      pthread_join()
      pthread_testcancel()
      ...

So all those cancellation points, besides the condition variables, are related to IO procedures.

It should be noted that even if an application is not using asynchronous cancellation, that calling a function from the above list from an asynchronous signal handler may cause the equivalent of asynchronous cancellation. The underlying user code may not expect asynchronous cancellation and the state of the user data may become inconsistent. Therefore signals should be used with caution when entering a region of deferred cancellation.

In particular, once a thread is cancelled:

how does the runtime (asyncdispatch / chronos / weave / taspools) deal with destroying all resources of that thread?
- it should expose a cancellation API
- but in that case they might as well expose ways to build custom cancellation, and then the pthread one becomes useless.
what if the thread is in some C library, for example OpenSSL when the cancellation arrives?

Cancellation is a huge problem even when a language control everything, see:

I wouldn't add pthread cancellation before runtime writers figure out their cancellation strategy. And I have no ideas on Windows and Mac potential specificies.

mratsim commented 2 years ago

pthread_cancel doesn't work on Windows, at least a decade ago: http://blog.ezyang.com/2010/09/pthread-cancel-on-window/

nim-lang / RFCs

Redesign threads interface for Nim v2 #401