trolando / sylvan

Implementation of multi-core (binary) decision diagrams
Apache License 2.0
65 stars 28 forks source link

Lace error: Unable to bind worker memory to node #3

Closed thisiscam closed 7 years ago

thisiscam commented 7 years ago

Seems like after the change that uses hwloc backend, sylvan no longer runs correctly.

Lace warning: hwloc_set_cpubind returned -1!
Lace warning: hwloc_set_cpubind returned -1!
Lace warning: hwloc_set_cpubind returned -1!
Lace warning: hwloc_set_cpubind returned -1!
Lace error: Unable to bind worker memory to node!
Lace warning: hwloc_set_cpubind returned -1!
Lace error: Unable to bind worker memory to node!
Lace warning: hwloc_set_cpubind returned -1!
Lace warning: hwloc_set_cpubind returned -1!
Lace error: Unable to bind worker memory to node!
Lace error: Unable to bind worker memory to node!
Lace warning: hwloc_get_area_membind_nodeset returned -1!
Lace error: Unable to bind worker memory to node!
Lace warning: hwloc_get_area_membind_nodeset returned -1!
Lace error: Unable to bind worker memory to node!
Lace error: Unable to bind worker memory to node!
Lace warning: hwloc_get_area_membind_nodeset returned -1!
Lace warning: hwloc_get_area_membind_nodeset returned -1!
Lace warning: Lace worker memory not bound with BIND policy!
Lace warning: hwloc_get_area_membind_nodeset returned -1!
Lace warning: Lace worker memory not bound with BIND policy!
Lace warning: hwloc_get_area_membind_nodeset returned -1!
Lace warning: hwloc_get_area_membind_nodeset returned -1!
Lace warning: Lace worker memory not bound with BIND policy!
Lace warning: Lace worker memory not bound with BIND policy!
Lace warning: Lace worker memory not bound with BIND policy!
Lace warning: Lace worker memory not bound with BIND policy!
Lace warning: Lace worker memory not bound with BIND policy!

I am assuming that my hwloc is not setup correctly, but not sure where to look. Any pointers would be appreciated! I am on an macOS 10.12.4, 2.5 GHz Intel Core i7

hwloc was installed from here: https://www.open-mpi.org/software/hwloc/v1.11/

trolando commented 7 years ago

Interesting, maybe there is a permissions problem on the Mac?

You can see in lace.c where the warnings are generated. You can ignore them, but they are responsible for making sure that every worker thread is running on exactly one CPU/core. This should prevent the operating system from migrating threads to other CPUs, which is very bad for performance, especially on NUMA machines that have multiple memories.

For a non-NUMA machine it is not so detrimental. Maybe I can change Lace a bit so it only generates a warning if the machine is NUMA.

thisiscam commented 7 years ago

Is the library tested on any Mac? https://github.com/open-mpi/hwloc/issues/140 Could this be related?

trolando commented 7 years ago

Travis tests on the Mac and reports these errors too. I used to develop Lace and Sylvan on the Mac, but that was at my previous university position and I no longer have a Mac.

thisiscam commented 7 years ago

After some investigation, the quoted issue above is correct: OSX does not support cpu_bind.

I think this can be solved by suppressing the warning on non-NUMA machines. One other way is to detect these functionalities via CMake(using try_compile or try_run), and issue warning at build step if cpu_bind is not possible.

Let me know if you'd like the second method. I can contribute a PR if you want to do the second way.

trolando commented 7 years ago

In the newest version, HWLOC is disabled. It may return later as an option for NUMA machines.