rexyai / RestRserve

R web API framework for building high-performance microservices and app backends
https://restrserve.org
275 stars 32 forks source link

Empty reply from server when size of multipart body above certain threshold #150

Closed rplati closed 4 years ago

rplati commented 4 years ago

I’m running RestRserve in a linux container starting from image rocker/r-ver:3.6.3. I’ve noticed that when the size of the multipart body grows above a certain threshold, RestRserve returns an empty response. This does not happen when I run RestRserve directly on my Windows machine.

This is a minimal example: Dockerfile

FROM rocker/r-ver:3.6.3

RUN install2.r --error \
  RestRserve \
  readr

COPY / /
EXPOSE 80
ENTRYPOINT ["Rscript", "restrserve.R"]

restrserve.R

library(RestRserve)
library(readr)

app = Application$new()
app$logger$set_log_level("trace")

app$add_post(
  path = "/echo",
  FUN = function(request, response) {
    app$logger$info(msg="Reading dta", fileName = c("dta"))
    dt <- read_csv(request$get_file("dta"))
    app$logger$info(msg="Data read successful", fileName = c("dta"))

    response$set_body(format_csv(dt))
  }
)

backend = BackendRserve$new()
backend$start(app, http_port = 80)

I build the image and start the container mapping tha container's port 80 to the host's port 5017 (port 80 not allowed on windows). Then I submit a post request to the endpoint.

library(readr)

# Generate data
n<-20000000      # observations
dta<-data.frame(
  numVar=round(rnorm(n),3),
  charVar=c("foo", "bar")[(runif(n)<0.5)+1]
  )

# Write to file
tmp <- tempfile()
write_csv(dta, tmp)
utils:::format.object_size(file.info(tmp)$size, "auto")
#> [1] "198.1 Mb"

# POST request with file
time<-system.time({
  rs <- POST(
    url = "http://127.0.0.1:5017/echo",
    body = list(dta = upload_file(tmp)),
    encode = "multipart"
  )
})

On my machine, when n=10000000 (99.1 Mb) everything works fine and the container log looks like this.

{"timestamp":"2020-06-08 13:24:13.828893","level":"DEBUG","name":"Application","pid":19,"msg":"","context":{"request_id":"587be48a-a98b-11ea-8a0a-0242ac110002","request":{"method":"POST","path":"/echo","parameters_query":{},"parameters_path":[],"headers":{"content-type":"multipart/form-data; boundary=------------------------7909616e2142a7b4","host":"127.0.0.1:5017","user-agent":"libcurl/7.64.1 r-curl/4.3 httr/1.4.1","content-length":"103879300","accept-encoding":["deflate","gzip"],"accept":["application/json","text/xml","application/xml","*/*"]}}}}
{"timestamp":"2020-06-08 13:24:13.829847","level":"TRACE","name":"Application","pid":19,"msg":"","context":{"request_id":"587be48a-a98b-11ea-8a0a-0242ac110002","middleware":"EncodeDecodeMiddleware","message":"call process_request middleware"}}
{"timestamp":"2020-06-08 13:24:13.849487","level":"TRACE","name":"Application","pid":19,"msg":"","context":{"request_id":"587be48a-a98b-11ea-8a0a-0242ac110002","message":"try to match requested path '/echo'"}}
{"timestamp":"2020-06-08 13:24:13.850235","level":"TRACE","name":"Application","pid":19,"msg":"","context":{"request_id":"587be48a-a98b-11ea-8a0a-0242ac110002","message":"requested path matched"}}
{"timestamp":"2020-06-08 13:24:13.851160","level":"TRACE","name":"Application","pid":19,"msg":"","context":{"request_id":"587be48a-a98b-11ea-8a0a-0242ac110002","message":"call handler '1'"}}
{"timestamp":"2020-06-08 13:24:13.852220","level":"INFO","name":"Application","pid":19,"msg":"Reading dta","fileName":"dta"}
Parsed with column specification:
cols(
  numVar = col_double(),
  charVar = col_character()
)
{"timestamp":"2020-06-08 13:24:18.141534","level":"INFO","name":"Application","pid":19,"msg":"Data read successful","fileName":"dta"}
{"timestamp":"2020-06-08 13:24:21.769933","level":"TRACE","name":"Application","pid":19,"msg":"","context":{"request_id":"587be48a-a98b-11ea-8a0a-0242ac110002","middleware":"EncodeDecodeMiddleware","message":"call process_response middleware"}}
{"timestamp":"2020-06-08 13:24:22.279853","level":"DEBUG","name":"Application","pid":19,"msg":"","context":{"request_id":"587be48a-a98b-11ea-8a0a-0242ac110002","response":{"status_code":200,"headers":{"Server":"RestRserve/0.2.2"}}}}

But when the data size is above a certain threshold (on my machine n=20000000, 198.1 Mb) the endpoint returns an empty response and the log looks like this

{"timestamp":"2020-06-08 13:28:03.809148","level":"DEBUG","name":"Application","pid":19,"msg":"","context":{"request_id":"e18f5a04-a98b-11ea-8a0a-0242ac110002","request":{"method":"POST","path":"/echo","parameters_query":{},"parameters_path":[],"headers":{"content-type":"multipart/form-data; boundary=------------------------355f8f515b2f0e05","host":"127.0.0.1:5017","user-agent":"libcurl/7.64.1 r-curl/4.3 httr/1.4.1","content-length":"207762542","accept-encoding":["deflate","gzip"],"accept":["application/json","text/xml","application/xml","*/*"]}}}}
{"timestamp":"2020-06-08 13:28:03.820926","level":"TRACE","name":"Application","pid":19,"msg":"","context":{"request_id":"e18f5a04-a98b-11ea-8a0a-0242ac110002","middleware":"EncodeDecodeMiddleware","message":"call process_request middleware"}}
{"timestamp":"2020-06-08 13:28:03.877614","level":"TRACE","name":"Application","pid":19,"msg":"","context":{"request_id":"e18f5a04-a98b-11ea-8a0a-0242ac110002","message":"try to match requested path '/echo'"}}
{"timestamp":"2020-06-08 13:28:03.912894","level":"TRACE","name":"Application","pid":19,"msg":"","context":{"request_id":"e18f5a04-a98b-11ea-8a0a-0242ac110002","message":"requested path matched"}}
{"timestamp":"2020-06-08 13:28:03.913440","level":"TRACE","name":"Application","pid":19,"msg":"","context":{"request_id":"e18f5a04-a98b-11ea-8a0a-0242ac110002","message":"call handler '1'"}}
{"timestamp":"2020-06-08 13:28:03.913912","level":"INFO","name":"Application","pid":19,"msg":"Reading dta","fileName":"dta"}

i.e. the call seems to get stuck at parsing the multipart body. The container should have enough memory available. Any idea how to debug this? Is there some inherent limit I should be aware of? I would like to post files to the endpoint with max size of about 2 GB.

dselivanov commented 4 years ago

I can't reproduce. May be you have some container memory restrictions? Apart from that I recommend:

rplati commented 4 years ago

Thanks for the reply, @dselivanov . I tried the rexyai/restrserve:0.2.2 image but it did’n change the container’s behavior.

The issue does seem to be related to the memory the container has available. I was able to reproduce the behavior on a linux server but at a much higher threshold (n=1600000000; 1.5 Gb). Starting the container with the option --memory=80m reproduces the behavior at a lower filesize.

If it is possible to implement, an error message would be appreciated in this kind of situation.

dselivanov commented 4 years ago

Unfortunately we can't do anything about that - this is controlled by OS. Which means that if process consumes all the memory, OOM killer will kill it.

Also keep in mind that if you start containers on windows or os x, then the virtual machine has its own restriction on the max memory for the containers (on OS X it is 2gb by default). So even if you specify --memory=4g it will be capped to to 2g. You need to control this separately in docker app options.

Screen Shot 2020-06-09 at 12 14 11
rplati commented 4 years ago

I’ve investigated the issue further. As far as I can gather, RestRserve quickly drains the container’s memory when the data passed to it are in multipart form.

I’ve monitored the container’s resources with docker stats and varied the container’s available memory with the –-memory option when starting the container. When the container has X amount of RAM available, a file of size Y will cause the OOM killer to kill the process resulting in an empty reply from server. When the body is passed as text/plain, the ratio of X and Y is about 2, i.e. when the container has 1 GB of memory available, up to 500 MB can be passed to the endpoint successfully. With a multipart body, the ratio is about 7, i.e. we need to allocate about 7 times the amount of memory to the container as the largest file we wish to pass (not considering concurrent requests). I got roughly the same results running the container on windows and linux.

The multiplier of 7 is prohibitive with large file sizes and possible concurrent requests. Would you consider investigating and possibly optimizing the code?

dselivanov commented 4 years ago

@rplati thanks for investigation, I can reproduce. I've opened a separate ticket - #151