sustainable-computing-io / kepler-model-server

Model Server for Kepler
Apache License 2.0
25 stars 25 forks source link

feat: update select logic with spec similarity computation #370

Closed sunya-ch closed 2 months ago

sunya-ch commented 2 months ago

Related to https://github.com/sustainable-computing-io/kepler-model-server/issues/216, this PR adds a logic to find a node_type from the spec information.

The idea is:

  1. find exact node_type that equals or covers the machine spec
  2. if not found, compute similarity between the machine spec and each node_type and select the most similar node_type
  3. if more than one candidates with the same score, find uncertainty of selection (frequency of selected type over all candidates) and select the most frequent one.

This PR also updates the machine spec discovery in estimator to align with the kepler PR https://github.com/sustainable-computing-io/kepler/pull/1684. The updated logic is to first check availability of mounted spec first. If not available, apply the discover function.

Signed-off-by: Sunyanan Choochotkaew sunyanan.choochotkaew1@ibm.com

sunya-ch commented 2 months ago

Note that need to update ComponentModelWeights in kepler regressor module to unmarshal also the model_name field.

sunya-ch commented 2 months ago

@sthaha thank you so much for the reviews. Made an update. Also need the following PR in kepler to be merged first to support model_name attribute in weight file:

rootfs commented 2 months ago

1699 is merged, shall we merge this one now?

sunya-ch commented 2 months ago

1699 is merged, shall we merge this one now?

I think we could but there are following things might be good to resolve first. (i) I would like to have all green on test. Not sure whether it is related to the unmarshal error in current Kepler 0.7.11 release. (ii) @sthaha may you check your comments are all resolved? (iii) it might be good to see whether the compatibility is solved with the latest Kepler by having the following CI PR merged first.