iterative / mlem

🐶 A tool to package, serve, and deploy any ML model on any platform. Archived to be resurrected one day🤞
https://mlem.ai
Apache License 2.0
717 stars 44 forks source link

Add hardware requirements to serialized MLEM model #379

Open aguschin opened 2 years ago

aguschin commented 2 years ago

Would be cool if we could add hardware requirements for inference to .mlem file.

Something like "to run this NN you need GPU with 8GB of Ram (for batch size 16)" or "to run this XGBoost model you need 16GB RAM" would be useful to have. It could help users and in future this could help us in running models from Studio.

@mike0sv, do you think it's feasible?

@daavoo, I see potential overlap with DVCLive here, e.g. you should log RAM/CPU/GPU required for training. But I guess you don't have plans for logging what's required for inference.

mike0sv commented 2 years ago

There are 3 parts to this

  1. Adding a field to model metadata and a new MlemABC type and it's implementations for different kind of hardware - easy, we can do whatever we want
  2. Checking/enforcing those requirements in user's env: probably doable, a least for common cases like ram, cpu, gpu
  3. Auto-detecting those in the first place: don't see how we can do that, probably can be only provided by uses manually
aguschin commented 2 years ago

Auto-detecting those in the first place: don't see how we can do that, probably can be only provided by uses manually

Can't we just pass sample_data to the model predict methods and measure this?

daavoo commented 2 years ago

@daavoo, I see potential overlap with DVCLive here, e.g. you should log RAM/CPU/GPU required for training. But I guess you don't have plans for logging what's required for inference.

Not on DVCLive, but we plan to add some system monitoring logic to DVC that could log RAM / CPU / GPU usage during stage execution.

Mapping these usage metrics to requirements for inference might be tricky, as the number of resources for training/inference will vary.

We need also to consider that is common to optimize the model somehow when exporting from training to inference.

madhur-tandon commented 2 years ago

I guess we can extend this i.e. if gpu stuff is required The docker builder should know which GPU specific libraries (all the cuda related stuff) to install.