Separate BSO as a server

ml-energy / zeus

Deep Learning Energy Measurement and Optimization

https://ml.energy/zeus

Apache License 2.0

180 stars 24 forks source link

Separate BSO as a server #34

Closed show981111 closed 2 months ago

show981111 commented 4 months ago

Pull BSO out as a server and implement the client on the training side.

Components

BSO server
BSO client

Currently, working on simple unit tests.

TODO:

[x] Integrate examples/trace_driven/run_single.py
[x] Add delete /job for servers
[x] ORM and connect DB
[x] Dockerfile
[x] Kubeflow
[x] clean up old codes

Thoughts

Any additional endpoints we have to support from the server?

Schema

jaywonchung commented 4 months ago

Just noting here for when we integrate the client-side BatchSizeOptimizer with HuggingFace's Trainer, I think we can fetch the latest validation metric with TrainerState.best_metric on the call to on_evaluate.

jaywonchung commented 4 months ago

@show981111 I had to unpin Pydantic to get Zeus working with something else that I'm working on. Please take a look at #37 and rebase.

jaywonchung commented 3 months ago

@show981111 Can the server be used without K8s or KubeFlow as well?

show981111 commented 3 months ago

@show981111 Can the server be used without K8s or KubeFlow as well?

Yes. can just run using uvicorn just like any fastapi app.

vercel[bot] commented 3 months ago

@show981111 is attempting to deploy a commit to the jaywonchung's projects Team on Vercel.

A member of the Team first needs to authorize it.

vercel[bot] commented 3 months ago

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Comments	Updated (UTC)
zeus	❌ Failed (Inspect)			Apr 4, 2024 3:42pm

show981111 commented 3 months ago

I finished the implementation. I haven't changed the /examples section yet since our implementation is still in review.

jaywonchung commented 3 months ago

Happy to see that you're making progress on documentation.

I pushed to master to (finally) enable local documentation building. Please rebase to / merge master, and after that, you can locally build & preview the documentation:
```
$ pip install -r docs/requirements.txt
$ mkdocs serve -a localhost:7777
```
Please create a Mermaid sequence diagram to describe how the server and client talks to each other.