microsoft / pai

Resource scheduling and cluster management for AI
https://openpai.readthedocs.io
MIT License
2.64k stars 548 forks source link

New RestServer Architecture: RestServer -> DB -> ApiServer #4651

Open yqwang-ms opened 4 years ago

yqwang-ms commented 4 years ago

By leveraging DB, RestServer can be

  1. RAW Consistency
  2. High Perf and Powerful Query: List, Paging, Sorting, Summarizing, etc
  3. Larger storage quota and duration
  4. Active and history jobs are unified and merged together naturally (no need to introduce UID) (https://github.com/microsoft/pai/issues/3935)
  5. Job Name can submit idempotently, attach metadata arbitrarily, and query uniquely (https://github.com/microsoft/pai/issues/3935)
  6. etc

Features depend on it List History Jobs: https://github.com/microsoft/pai/issues/3845, https://github.com/microsoft/pai/issues/4610, https://github.com/microsoft/pai/issues/3935 Expose K8s events: part of enrich job debugging info: https://github.com/microsoft/pai/issues/4649

New RestServer Architecture In short, compared with current architecture, we insert a DB between RestServer and ApiServer. image

Sub Tasks

fanyangCS commented 4 years ago

relate to https://github.com/microsoft/pai/issues/4600