s-u / Rserve

Fast, flexible and powerful server providing access to R from many languages and systems
http://RForge.net/Rserve
Other
282 stars 65 forks source link

Recommendations for RServe's virtualiasation for a multiuser R Server? #85

Closed Henri-Lo closed 3 years ago

Henri-Lo commented 7 years ago

I found this issue https://github.com/s-u/Rserve/issues/22 about no native Windows support. So I am feeling the best option to provide R server with load balancing support to Windows Server is to virtualise the RServe and load balancing options, such as HAproxy and Nginx. This could be done with Docker or a VM.

Recommendations for RServe's virtualisation?

  1. It looks like Debian has ready packages for RServe here, a good candidate to run Rserve? On which OS it is recommend to virtualise the RServe?

  2. Do you have some recommendations about virtualising the RServe? Is there any LTS version of RServe? Which versions of RServe are stable versions?

  3. Is there any recommended system profile for RServe and HAproxy to deal with the load in a multiuser R server?

s-u commented 7 years ago

Re virtualization - It depends on which protocols you use with Rserve. Yes, you can use nginx for the HTTP/WS support. We also provide a proxy for this as well with some extra features. However, if you want to use QAP then there is no native support.

As for your points:

Our production machines use Ubuntu (preferred) and RHEL (if the customer insists on old software). In principle any unix is fine (we develop on macOS so Linux and macOS have the best support) - on Linux you have the benefit of being able to use containers.

The master branch is stable and up-to-date with bugfixes - so currently 1.8-5.

I have not used HAproxy, but for nginx this is a sample load-balancing setup:

map $http_upgrade $connection_upgrade {
    default upgrade;
    ''  close;
}
upstream rcloud_servers {
    least_conn;
    server 10.0.0.11:8080;
    server 10.0.0.12:8080;
    [...]
}
server {
    listen 443;
[...]
    location / {
        try_files $uri $uri/ @proxy;
    }

    location @proxy {
        proxy_pass      http://rcloud_servers;
        proxy_set_header    X-Real-IP  $remote_addr;

        # allow connection upgrade to WebSockets
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection $connection_upgrade;

        # RCloud may serve long-running queries ...
        proxy_read_timeout 30m;
    }
}
Henri-Lo commented 7 years ago

About which protocols you use with Rserve?

This is the plugin I am considering to use

https://github.com/qlik-oss/sse-r-plugin

that uses Google's binary protocol, gRPC, and C# SSE plugin which in turn accesses Rserve to be able to run R scripts.

OS related

So Ubuntu LTS or Debian (because Ubuntu based on it) are good to go? Ubuntu 16.04.2 LTS? Or Debian 9 (Stretch)?

Load balancing

Thank you for recommending Nginx, earlier heard recommendations for HAproxy, have to compare the two alternatives.

s-u commented 7 years ago

Re SSE-R-plugin, I don't know much about it, so please check whether it is using QAP or HTTP/WS since the nginx/HAproxy only support the latter, but most language interfaces use QAP since it's the native Rserve protocol. WS wraps QAP messages into WS frames. I do have a proxy on the back-end so you can convert WS to QAP but I don't have the inverse (QAP to WS). You could in principle use node (since https://github.com/att/rserve-js provides QAP support to JS) if you really want.

As for OS, yes, I'm using Ubuntu 16.04.2 LTS but Debian is fine as well.

Like I said I don't know about HAproxy so the decision is more due to the fact we are using nginx for other things anyway than any in-depth comparison.