openfaas / faas

OpenFaaS - Serverless Functions Made Simple
https://www.openfaas.com
Other
25.22k stars 1.94k forks source link

Proposal: stream data to functions to process data bigger than available memory #337

Closed trusch closed 6 years ago

trusch commented 7 years ago

Overview

When using the watchdog in constraint environments it happens that you run out of memory for operations that are not memory intensive.

Examples for such operations are:

Expected Behaviour

It should be possible to operate on input data which is bigger than the available memory. In fact this should be possible:

> dd if=/dev/urandom of=random.dat bs=1M count=1024
> sudo systemd-run --unit watchdog-test -p MemoryMax=32M -p Environment="fprocess=sha512sum" ${GOPATH}/bin/watchdog
> curl -d "@random.dat" http://localhost:8080
6528a4138791e9728fbc24bddfb69a81b8dbb0835307c7bb05d463bfd9edd1e7497708ad514575f4b6c94c54f6d5fa941edb3e72f7ee3855bea89952ad25a321  -

Current Behaviour

The watchdog cancels the request after a while

> dd if=/dev/urandom of=random.dat bs=1M count=1024
> sudo systemd-run --unit watchdog-test -p MemoryMax=32M -p Environment="fprocess=sha512sum" ${GOPATH}/bin/watchdog
> curl -d "@random.dat" http://localhost:8080
curl: (56) Recv failure: Connection reset by peer

Possible Solution

It is possible to provide data streaming for both sides (reading from and writing to the client) using io.Reader/io.Writer interfaces

Steps to Reproduce (for bugs)

  1. run the watchdog in a memory constaint environment
    • either use a small device with small memory or simulate it
    • use some function which doesn't require all data in memory (i.e. sha512sum for example)
  2. create some random data which is bigger in size than the available memory
  3. Post the data to the watchdog
  4. See the error

Context

I am evalutating openfaas for some big-data related tasks and found this bug while reading the source code.

Your Environment

alexellis commented 7 years ago

Thanks for the details and follow up. The predominant issue with connecting an IO pipe straight to the underlying process it that it will prevent us from writing custom headers if an error occurs while processing a request for example. I think this may a good option for a third mode along with the original buffering mode and afterburn.

trusch commented 7 years ago

This is true. We can not start writing to the http.ResponseWriter before the execution finished. So the output of the process needs to be buffered. The input on the other hand could be piped into the process without affecting our possibility to set customs headers in error cases. Do you see other problems when only the input would be piped?

All of the examples I provided would be possible with this approach.

Of course functions which take small inputs and produce large outputs would continue to be problematically.

johnmccabe commented 7 years ago

Derek add label: proposal

alexellis commented 7 years ago

I have started some work with this, but it needs to cover every part of the pipeline from the gateway to the back-end provider all the way to the function itself. (It's not just a case of hacking on the watchdog).

trusch commented 7 years ago

I just created a two pull request to offer a bit of help from my side to this topic. They are very small change which won't break anything but should lower memory consumption on the gateway and watchdog.

imikushin commented 7 years ago

Would be awesome if all the I/O could be streamed: stdin into, stdout / stderr from the function.

alexellis commented 7 years ago

@trusch can you verify if this is now working in https://github.com/openfaas-incubator/of-watchdog with the streaming mode?

trusch commented 6 years ago

@alexellis yes this works, I also really like the code, it's much cleaner now ;) Only one thing: in streaming mode stdin and stdout are both streamed. What happens if a function fails after the first byte has been send?

trusch commented 6 years ago

I think a very simple solution would be to send an HTTP trailer X-Error or so in this case before returning the handler and making the cli (not the gateway) aware of this, so that it can mimic the exit code of the invoked function.

alexellis commented 6 years ago

Appreciate that, thanks How would that look for a hello world sample? Can you write some fake request and response text here?

alexellis commented 6 years ago

Derek close: version of this available in openfaas-incubator/of-watchdog