cannot use source or eval in Rserve.conf or in backend$start as additional args

kevinnowland commented 2 years ago

Hello! I have been trying to take advantage of pre-loading data via the source and / or eval arguments in Rserve for an app that is using RestRserve. The ideas was to prevent memory spikes when receiving many concurrent requests by pre-loading objects before the forking happens.

The following set of files give an example setup that is not working as I would have thought. This is using RestRserve_1.1.1.

Dockerfile

FROM r-base:4.1.2

RUN R -e "install.packages('RestRserve')"
COPY server.R .

CMD ["Rscript", "server.R"]

which is built with the command

docker build -t rserve:test .

server.R

library("RestRserve")

ping <- function(.req, .res)
{
    .res$set_status_code(200L)
    .res$set_body(jsonlite::toJSON("OK"))
    .res$set_content_type("application/json")

    # Line below does not work if uncommented
    # print(x)
}

app <- Application$new()

## register endpoints and corresponding handlers
app$add_get(path = "/ping", FUN=ping)

backend = BackendRserve$new()

backend$start(app, http_port = 8080)

# I have also tried the below commented line to try to avoid the 
# config file entirely:
# backend$start(app, http_port = 8080, source="/startup.R")

# this also does not print anything if uncommented
# backend$start(app, http_port = 8080, eval="print('hello')")

docker-compose.yaml

I have been starting the image using docker-compose up with the following docker-compose.yaml file which loads Rserve.conf to /etc/Rserve.conf (tried also as Rserv.conf).

version: '3.7'
services:
    test:
      image: rserve:test
      logging:
          options:
              max-size: 10m
              max-file: "3"
      ports:
        - "8181:8080"
      volumes:
        - ./server.R:/server.R
        - ./startup.R:/startup.R
        - ./Rserve.conf:/etc/Rserve.conf

The server starts up fine.

`Rserve.conf` or `Rserv.conf`

Then I am trying to configure Rserve using the following file which gets put into the image at runtime to /etc/Rserve.conf (I have also tried /etc/Rserv.conf without the e):

source /startup.R

Which should be referencing the following file

`startup.R`

print("\n\n\nhello\n\n\n")

x <- 1

However, I cannot see either the statement getting printed in the docker-compose logs and cannot use the variable x in the ping route, for example.

Any help would be appreciated. Thank you!

s-u commented 2 years ago

The source configuration parameter is only used by the Rserve daemon before forking the server. However, RestRserve() uses the current session and run.Rserve, so using the source directive would be pointless since you can simply run anything you want in the session, so simply add

source("startup.R")

in server.R before you call backend$start

kevinnowland commented 2 years ago

So that's what we basically had been doing. We loaded in a data file that was about 250 MB before starting the application. This data file was used in a read-only manner by one of the route functions. However, our memory was spiking at the rate of ~ 1 GB per every 4 concurrent requests. We were wondering if we needed to preload it in a different way and tried to go down this path of using source / Rserve.conf. Is it expected that the 250 MB would be loaded into memory separately per request or is that a likely artifact of something we did inadvertently in our code?

Regardless of the particular shenanigans we were up to, should it be possible to pass in source or use the Rserve.conf file? It does not seem to be happening for us. I am probably not clear on how run.Rserve differs from using the Rserve function directly or something like that, as I'm not super familiar with the R ecosystem.

Thanks for taking the time to respond!

dselivanov commented 2 years ago

So that's what we basically had been doing. We loaded in a data file that was about 250 MB before starting the application. This data file was used in a read-only manner by one of the route functions. However, our memory was spiking at the rate of ~ 1 GB per every 4 concurrent requests. We were wondering if we needed to preload it in a different way and tried to go down this path of using source / Rserve.conf. Is it expected that the 250 MB would be loaded into memory separately per request or is that a likely artifact of something we did inadvertently in our code?

Likely you do something in the request handler which causing high memory usage / copying of large objects. As Simon suggested you can load data with readRDS() at any place before backend$start().

Regardless of the particular shenanigans we were up to, should it be possible to pass in source or use the Rserve.conf file? It does not seem to be happening for us. I am probably not clear on how run.Rserve differs from using the Rserve function directly or something like that, as I'm not super familiar with the R ecosystem.

What is the use case? You can provide most of the Rserve config parameters when calling backend$start(...).

s-u commented 2 years ago

@kevinnowland Please note that standard Linux tools (like ps) duplicate shared memory in reports, i.e., they report all memory the process uses whether it is shared with another process or not. The actually used memory is much less than the total you see. You can use smem to see the difference.

And, no, you cannot use source nor eval in the configuration, because you are not starting a Rserve process at all. Instead you use run.Rserve() to run a server in the R process in which you defined your application.

kevinnowland commented 2 years ago

Thank you everyone for the responses! Very helpful.

rexyai / RestRserve