Open Vindaar opened 4 years ago
I have a similar problem. I found that there is no pthread_setaffinity_np on Android but Weave tries to call it which gives a linker error.
Fixed for Android. I'm unsure about termux, will have to test because my CI does run fine on ARM64 on Travis 32 cores CPUs.
Thanks Mamy. Unfortunately it seems to more complicated. There are still other problems - see below
First, I've tried a different fix than yours but got an error message cpu_set(cpu, cpuset) cannot be called which I don't understand
proc sched_setaffinity(
thread: Pthread,
cpuset_size: csize_t,
cpuset: ptr CpuSet
) {.header: "<sched.h>".}
...
proc set_thread_affinity(t: Pthread, cpu: int32) {.inline.}=
var cpuset {.noinit.}: CpuSet
when defined(osx):
{.warning: "To improve performance we should pin threads to cores.\n" &
"This is not possible with MacOS".}
elif defined(android):
cpu_zero(cpuset)
cpu_set(cpu, cpuset) # error cpu_set(cpu, cpuset) cannot be called
sched_setaffinity(Pthread(0), sizeof(cpuset), addr cpuset)
else:
cpu_zero(cpuset)
cpu_set(cpu, cpuset)
pthread_setaffinity_np(t, sizeof(CpuSet), cpuset)
Back to your fix I need in addtion primitives/affinity_posix.nim
-proc set_thread_affinity(t: Pthread, cpu: int32) {.inline.}=
+proc set_thread_affinity(t: Pthread, cpu: csize_t) {.inline.}=
Furthermore, I need to patch memory/allocs.nim
when defined(windows):
proc aligned_alloc(alignment, size: csize_t): pointer {.sideeffect,importc:"_aligned_malloc", header:"<malloc.h>".}
proc wv_freeAligned*[T](p: ptr T){.sideeffect,importc:"_aligned_free", header:"<malloc.h>".}
-elif defined(osx):
+elif defined(osx) or defined(android):
proc posix_memalign(mem: var pointer, alignment, size: csize_t){.sideeffect,importc, header:"<stdlib.h>".}
proc aligned_alloc(alignment, size: csize_t): pointer {.inline.} =
posix_memalign(result, alignment, size)
And last - worst - I have to patch Nim's /usr/lib/nim/system/threadlocalstorage.nim
, as well
@@ -194,8 +194,15 @@
proc cpusetIncl(cpu: cint; s: var CpuSet) {.
importc: "CPU_SET", header: schedh.}
- proc setAffinity(thread: SysThread; setsize: csize_t; s: var CpuSet) {.
- importc: "pthread_setaffinity_np", header: pthreadh.}
+ when defined(android):
+ proc sched_setaffinity(thread: SysThread; cpuset_size: csize_t, ptrCpuset: ptr CpuSet) {.
+ importc: "sched_setaffinity", header: schedh.}
+
+ proc setAffinity(thread: SysThread; setsize: csize_t; s: var CpuSet) =
+ sched_setaffinity(thread, setsize, addr s)
+ else:
+ proc setAffinity(thread: SysThread; setsize: csize_t; s: var CpuSet) {.
+ importc: "pthread_setaffinity_np", header: pthreadh.}
const
since the linker tries to find pthread_setaffinity_np
which doesn't exist.
Should I create a pull request for this?
With all these patches, Weave does work under Termux (Android). I am not sure if it's possible to improve on your patch disabling setAffinity altogether.
By the way,
a short time ago, I asked for debugging a problem with Weave in my application (Ant colony optimization).
After several hours of hunting a non-existent bug in my code, I found out that I need to
put parallelFor
in an syncScope():
block, which is very non-intuitive if one comes from OpenMP or Nim's experimental
parallel
clause.
It's strange that you need to patch Nim itself, I wasn't aware that Android didn't implement the full pthread interface.
After several hours of hunting a non-existent bug in my code, I found out that I need to put parallelFor in an syncScope(): block, which is very non-intuitive if one comes from OpenMP or Nim's experimental parallel clause.
You can use awaitable loops like this https://github.com/mratsim/weave#awaitable-loop.
OpenMP evolved from a sync parallelism with a master thread that spawns all the parallel-for work while Weave is fundamentally async and any thread can spawn anything at anytime and can have barriers independently from others.
Furthermore, since OpenMP, decades happened and API for parallelism evolved, there have been many complaints about OpenMP implicit barriers that killed performance by requiring extra synchronization so I decided to remove all barriers by default and use Future/Flowvar (awaitable loops) and structured parallelism (syncScope which is like the async-finish construct in the X10 programming language which built on lessons learned from high-performance computing) to specify all synchronization points.
So I can indeed improve the documentation, but in the end I think it's much better if all synchronizations happens explicitly and is grep-able in the code than if synchronization is implicit. This means that if threads are desync-ed, some synchronization is missing. And it has performance benefits as well.
Trying to run code using Weave on Android within termux results in (at least two) different problems.
This is running on a Oneplus 6 with:
First of all trying to run the weave test suite results in a sigsegv:
Originally I tried to run @HugoGranstrom's nimbody benchmark for fun, which results in a linker error due to pthread_setaffinity_np not being defined:
If there's anything else that would be useful, let me know.