qiime2 / q2-diversity-lib

BSD 3-Clause "New" or "Revised" License
0 stars 11 forks source link

beta-phylogenetic-alt: psutil(logical=False) reports too few CPUs on EC2 #25

Closed wasade closed 4 years ago

wasade commented 6 years ago

Current Behavior The current CPU count check uses logical=False which has the effect of returning the number of "physical" cores observed on the system. For EC2, this is seen as only a single core. I believe the original intent of this was to discourage use of hyperthreading -- performance testing had suggested hyperthreading to be detrimental relative to avoiding it however using multiple threads is still faster than using a single thread.

Proposed Behavior The suggested fix here is to set logical=True. In addition, we may want to update the error to a warning, and to warn if the number of requested threads is > 16. The motivation for 16 is that's where performance leveled out when testing on 32-core boxes.

References As reported on the QIIME 2 forum here, and the code presenting an issue is here.

wasade commented 6 years ago

...to clarify, I think we should still error gracefully if more threads than the number of available threads is requested.

ChrisKeefe commented 4 years ago

@thermokarst, I suspect this issue should be moved to q2_diversity_lib. The code linked here has been factored out into _validate_requested_cpus, and will no longer exist in q2-diversity once qiime2/q2-diversity#281 is merged.

thermokarst commented 4 years ago

I agree, this looks like it should be all squared away now. Thanks!

ChrisKeefe commented 4 years ago

@thermokarst, we're still using logical=False; are you sure we want to close this?

In addition to the performance issue @wasade describes here, psutil's cpu_count(logical=False) is apparently buggy on multi-socket systems and at least some Docker containers. These issues might be resolved soon, but it's been a couple months and nothing's been merged yet.

Going to logical=True might be a win, and adding a "lots of cpus" warning in validate_requested_cpus should be straightforward.

thermokarst commented 4 years ago

I think closing the issue makes sense: the first-line cpu_affinity check gets us 99% of the way there, so the cpu_count(logical=False) business is really just a fallback in case cpu_affinity isn't available on a host (which I think is just on darwin systems). I also just checked this cpu_affinity logic out in a few EC2 instances, and all report back the right number of cores.