nv-morpheus / MRC

Morpheus Runtime Core (MRC)
Apache License 2.0
45 stars 27 forks source link

[FEA] Relax pe_count for CSP vCPU with hard limits #157

Open pdmack opened 2 years ago

pdmack commented 2 years ago

Is your feature request related to a problem? Please describe. Azure subscriptions impose a hard limit on vCPU resources in their VM. So if we run a Morpheus pipeline with 8 threads on a NC6s v2 (6 vCPU) instance we get an abort in thread_engines.cpp

F20220809 15:38:51.809185   262 thread_engines.cpp:38] 
Check failed: launch_options().pe_count == m_cpu_set.weight() 
(8 vs. 6) mismatch in the number of cores in the cpu set with respect to the requested pe_count
*** Check failure stack trace: ***
Aborted (core dumped)

Describe the solution(s) you'd like

Describe alternatives you've considered Workaround is for the Morpheus pipeline to be run with threads <= vCPU count in Azure VM.

Additional context https://docs.microsoft.com/en-us/azure/databricks/kb/clusters/azure-core-limit

pdmack commented 2 years ago

Actually this is not solely an Azure issue. I have reproduced this in EC2 and GCP and bare metal (80 cpu). Basically anywhere Morpheus asks for more threads than CPU, vCPU or otherwise.

jarmak-nv commented 2 years ago