numaproj / numalogic

Collection of operational time series ML models and tools
https://numalogic.numaproj.io/
Apache License 2.0
166 stars 28 forks source link

Implement: multiple save and load for mlflow registry #416

Closed yleilawang closed 3 weeks ago

yleilawang commented 4 weeks ago
  1. Implemented save_multiple and load_multiple for mlflow registry
  2. Test cases for implementation.
codecov[bot] commented 4 weeks ago

Codecov Report

Attention: Patch coverage is 59.37500% with 13 lines in your changes missing coverage. Please review.

Project coverage is 92.02%. Comparing base (b18b2d2) to head (69caf21).

Files with missing lines Patch % Lines
numalogic/registry/mlflow_registry.py 59.37% 10 Missing and 3 partials :warning:
Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #416 +/- ## ========================================== - Coverage 92.20% 92.02% -0.18% ========================================== Files 98 98 Lines 4834 4865 +31 Branches 437 442 +5 ========================================== + Hits 4457 4477 +20 - Misses 277 285 +8 - Partials 100 103 +3 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

ab93 commented 3 weeks ago

Also, update the version for pyproject.toml

@s0nicboOm we have not released any previous versions right?

yleilawang commented 3 weeks ago

Added changes for:

  1. Changed return type of load_multiple to ArtifactData, the same return type of load.
  2. Slots for CompositeModels class.
  3. Handling exceptions caused by unwarpping pyfunc models.
  4. Private function for getting unique sorted dkey list.
  5. Added test cases for cache loading for pyfunc models.
yleilawang commented 3 weeks ago

Profiling Updates

reg.save(skeys=skeys, dkeys= ["ae", "pipeline"], artifact=VanillaAE(10), artifact_type='pytorch', **{"a": "b"})
reg.save(skeys=skeys, dkeys= ["scaler", "pipeline"], artifact=StandardScaler(), artifact_type='sklearn', **{"a": "b"})
reg.save(skeys=skeys, dkeys= ["threshold", "pipeline"], artifact=StdDevThreshold(), artifact_type='sklearn', **{"a": "b"})

output = reg.save_multiple(
    skeys=skeys, dkeys=dkeys, dict_artifacts=dict_artifacts, **{"a": "b"}
)

def f():
    reg.load(skeys=skeys, dkeys= ["ae", "pipeline"], artifact_type='pytorch')
    reg.load(skeys=skeys, dkeys= ["scaler", "pipeline"], artifact_type='sklearn')
    reg.load(skeys=skeys, dkeys= ["threshold", "pipeline"], artifact_type='sklearn')

def g():
    reg.load_multiple(skeys=skeys, dkeys=dkeys)

import timeit

a = timeit.timeit(f, number=100)
b = timeit.timeit(g, number=100)

print(a, b)

--> 9.074686333002319 4.829918750001525