pentium3 / sys_reading

system paper reading notes
231 stars 12 forks source link

Serverless Computing: One Step Forward, Two Steps Back #53

Closed pentium3 closed 3 years ago

pentium3 commented 3 years ago

http://cidrdb.org/cidr2019/papers/p119-hellerstein-cidr19.pdf

pentium3 commented 3 years ago

Function-as-a-service

allow programmers to register functions with the cloud provider, and enable users to declare events that trigger each function. The FaaS infrastructure monitors the triggering events, allocates a runtime for the function, executes it, and persists the results.

We also need another component to store the intermediate states(eg: AWS S3).


One step forward of FaaS

provide autoscaling. The workload automatically drives the allocation and deallocation of resources.

In Sec2, the author describes several cases as an example:


Two steps back of FaaS

First, they painfully ignore the importance of efficient data processing. data processing更慢(无state) Second, they stymie(阻碍) the development of distributed systems. 更难写dist sys application

In sec3, the author describes some limitations:

  1. Limited Lifetimes. After 15 minutes, function invocations are shut down by the Lambda infrastructure(为了更好的scheduling,例如提前allocate resource).
  2. I/O Bottlenecks. Network IO is slow
  3. While Lambda functions can initiate outbound network connections, they themselves are not directly network-addressable in any way (there is no IP address for each running function). -> two Lambda functions can only communicate through an autoscaling intermediary service(eg. a storage system like S3). it's slow
  4. No Specialized Hardware
  5. cold-start: function本身很小(几秒),每次启动却需要一些时间。频繁启动function造成一些开销

These limitations could lead to several problems:

  1. FaaS is a Data-Shipping Architecture: Serverless functions are run on isolated VMs, separate from data. serverless functions are short-lived and non-addressable. -> We need to copy data to computing side, which is slower than copy code to data side.
  2. FaaS Stymies Distributed Computing: serverless functions are short-lived and non-addressable. -> hard to implement traditional distributed computing schemes(eg: consensus, transaction, etc.). with all communication transiting through storage, there is no real way for thousands (much less millions) of cores in the cloud to work together efficiently using current FaaS platforms other than via largely uncoordinated (embarrassing) parallelism.
  3. FaaS stymies hardware-accelerated software innovation: eg: we can not use GPU on FaaS platform
  4. FaaS discourages Open Source service innovation: ...

Case study

  1. Model Training
  2. Low-Latency Prediction Serving via Batching
  3. Distributed Computing

step forward to the future

  1. Fluid Code and Data Placement
  2. Heterogeneous Hardware Support
  3. Long-Running, Addressable Virtual Agents
  4. Disorderly programming
  5. Flexible Programming, Common IR
  6. Service-level objectives & guarantees
pentium3 commented 3 years ago

going forward:

  1. provide QoS guarantee: https://github.com/pentium3/sys_reading/issues/39
  2. more communication primitive: p2p/broadcast/all-reduce between functions
  3. stateful computation: function fail之后的恢复(function很容易崩...)。some work on DB or https://github.com/pentium3/sys_reading/issues/28 or [Pocket OSDI18] or [Beldi OSDI20]
  4. Heterogeneous Hardware Support
  5. observe function util and adaptive resource allocation
pentium3 commented 3 years ago

如何实现serverless infra?

Firecracker NSDI'20 https://github.com/pentium3/sys_reading/issues/55