Closed wasade closed 4 years ago
...to clarify, I think we should still error gracefully if more threads than the number of available threads is requested.
@thermokarst, I suspect this issue should be moved to q2_diversity_lib
. The code linked here has been factored out into _validate_requested_cpus
, and will no longer exist in q2-diversity
once qiime2/q2-diversity#281 is merged.
I agree, this looks like it should be all squared away now. Thanks!
@thermokarst, we're still using logical=False
; are you sure we want to close this?
In addition to the performance issue @wasade describes here, psutil's cpu_count(logical=False)
is apparently buggy on multi-socket systems and at least some Docker containers. These issues might be resolved soon, but it's been a couple months and nothing's been merged yet.
Going to logical=True
might be a win, and adding a "lots of cpus" warning in validate_requested_cpus
should be straightforward.
I think closing the issue makes sense: the first-line cpu_affinity
check gets us 99% of the way there, so the cpu_count(logical=False)
business is really just a fallback in case cpu_affinity
isn't available on a host (which I think is just on darwin systems). I also just checked this cpu_affinity
logic out in a few EC2 instances, and all report back the right number of cores.
Current Behavior The current CPU count check uses
logical=False
which has the effect of returning the number of "physical" cores observed on the system. For EC2, this is seen as only a single core. I believe the original intent of this was to discourage use of hyperthreading -- performance testing had suggested hyperthreading to be detrimental relative to avoiding it however using multiple threads is still faster than using a single thread.Proposed Behavior The suggested fix here is to set
logical=True
. In addition, we may want to update the error to a warning, and to warn if the number of requested threads is > 16. The motivation for 16 is that's where performance leveled out when testing on 32-core boxes.References As reported on the QIIME 2 forum here, and the code presenting an issue is here.