radical-cybertools / radical.pilot

RADICAL-Pilot
http://radical-cybertools.github.io/radical-pilot/index.html
Other
54 stars 23 forks source link

Provide a stable and documented interface to inspect the pilot for the exact amount of resources it has available. #2973

Closed eirrgang closed 3 weeks ago

eirrgang commented 1 year ago

The schema for Pilot.resource_details is not documented in https://radicalpilot.readthedocs.io/en/stable/apidoc.html#radical.pilot.Pilot, and Pilot.resource_details["rm_info"] is not described at all.

However, Pilot.resource_details["rm_info"] is the only way I know to find out how many nodes and cores were allocated to fulfill a PilotDescription.

I would like some assurance of when or how the structure may change in the future, or advice on other ways to get information. For example, if I want to know the numbers of cores allocated, how do I know whether it is better to check the value of Pilot.resource_details["rm_info"]["requested_cpus"] or to count the total number of cores in sum(len(node["cores"]) for node in Pilot.resource_details["rm_info"]["node_list"]), or which approach may be more stable in the long run?

andre-merzky commented 1 year ago

We consider the rm_info struct to be an implementation detail and as such not well specified. In fact, different resource managers (as in radical.pilot.agent.resource_manager.* may provide different structures for rm_info altogether. The purpose of that structure is to communicate essential information between the RM and the agent executor and scheduler. We expose rm_info as convenience for some use cases.

If that's ok for you, I would like to rephrase the feature request as: provide a stable and documented interface to inspect the pilot for the exact amount of resources it has available. Would that be acceptable?

andre-merzky commented 3 weeks ago

closed by #3117