maxCores set to local CPU count

mr-c commented 3 years ago

From @aniewielska

if this [self.max_cores = float(psutil.cpu_count())] is what is propagated to TES request, then this does not have much sense, if TES is remote (and it also affects the execution time of the test suite, as it affects the scheduling)

https://github.com/common-workflow-language/cwltool/blob/0e8110083bad6ea98fc487aa262953a6c5e010b5/cwltool/executors.py#L289

tom-tan commented 3 years ago

I guess there are no straightforward ways to get remote CPU count. I read the TES spec but I'm not sure that there is a way to refer the remote environment (e.g., CPU count, memory size, and storage size) with conformant TES servers or the TES spec does not support to refer the remote environment by design.

At least, such a limitation makes cwl-tes difficult to fully support CWL spec because supporting the parameter references is required by the CWL spec. Ideally, we have to fix the TES spec to support parameter references in some way (but it is out of scope of this repository, I guess).

There is a workaround that makes a wrapper to get the remote environment and execute the command and that submits a job that execute the wrapper to TES server. It is easier than other solutions, IMO.

mr-c commented 3 years ago

I guess there are no straightforward ways to get remote CPU count. I read the TES spec but I'm not sure that there is a way to refer the remote environment (e.g., CPU count, memory size, and storage size) with conformant TES servers or the TES spec does not support to refer the remote environment by design.

I concur, this is one of the areas where TES doesn't support CWL fully.

At least, such a limitation makes cwl-tes difficult to fully support CWL spec because supporting the parameter references is required by the CWL spec. Ideally, we have to fix the TES spec to support parameter references in some way (but it is out of scope of this repository, I guess).

There is a workaround that makes a wrapper to get the remote environment and execute the command and that submits a job that execute the wrapper to TES server. It is easier than other solutions, IMO.

This is not a bad idea as a short term fix, but lack of insight into what resources are truly available behind a TES endpoint is a serious deficiency and one of the reasons I recommend against the use of (or designing for) TES in production environments at this time.

ohsu-comp-bio / cwl-tes

maxCores set to local CPU count #44