socketio / socket.io-sticky

A simple and performant way to use Socket.IO within a cluster.
https://socket.io
MIT License
43 stars 13 forks source link

Multiple connections being made/dropped using polling (express) #3

Open josefpohl opened 3 years ago

josefpohl commented 3 years ago

Hi there, For starters, I am using Socket.io 2.1.1 (old, I know but it worked with React-Native).
I have a server that I am not sure is effectively displaying sticky behavior when polling is turned on. Here is a log entry from a single client: process: 6680, 2021-06-01T15:48:02.941Z, socketManager, io.sockets.on(connect), Will broadcast HELLO JfLCgyTZmvMcpMgzAAFZ process: 6674, 2021-06-01T15:48:05.127Z, socketManager, io.sockets.on(connect), Will broadcast HELLO MCAoSsuLNuiGe2o3AAED process: 6680, 2021-06-01T15:48:06.989Z, socketManager, io.sockets.on(connect), Will broadcast HELLO LzxAW4_1Fhk3D4cdAAFa

The system seems to, on the initial connect seem to jump between different instances of the server (in this case the server running on process 6680 and 6674). My fear is that I do not have my server master/worker set up properly. So this is more of a question than an issue. I am using cluster and express to server dynamic and static routes. I do have a Redis client on the back end but that is receiving events properly. Here is the relevant parts of my server setup:

const express = require('express');
const app = express();
const cluster = require('cluster');
...
const { setupMaster, setupWorker } = require('@socket.io/sticky');
...
if (cluster.isMaster) {
  console.log(Master is started with process ID ${process.pid});

  const serverhttp =
    process.env.NODE_ENV === 'production'
      ? https.createServer(options, app)
      : http.Server(app);

  redisCache.setRedisUsers([]);
  setupMaster(serverhttp, {
    loadBalancingMethod: 'least-connection', // either "random", "round-robin" or "least-connection"
  });
 for (let i = 0; i < numCPUs; i++) {
    cluster.fork();
  }

  cluster.on('exit', (worker) => {
    console.log(`Worker died on process ID ${worker.process.pid}`);
    cluster.fork();
  });
} else {
console.log(`Worker started at process ID ${process.pid}`);
  app.use(bodyParser.urlencoded({ extended: false }));
  app.use(bodyParser.json());

  const serverhttp =
    process.env.NODE_ENV === 'production'
      ? https.createServer(options, app)
      : http.Server(app);

/* app.use routes, cors, static routes; db connection initialization */

const io = socketManager(serverhttp, redisCache); // initialize socket
  setupWorker(io);
  serverhttp.listen(port, () => {
    console.log(`Starting server on ${port}`);
  });
  serverhttp.keepAliveTimeout = 65 * 1000;
  serverhttp.headersTimeout = 66 * 1000;

}

socketManager here encapsulates the socket connection creation and all of the events tied to it. (redisCache is simply a user management cache.

The module simply exports the socket configuration code:

module.exports = function (serverhttp, redisClient) {
  const config = {
    pingTimeout: 60000,
    pingInterval: 25000,
    transports: ['polling','websocket'], 
  };

  const io = socketio.listen(serverhttp, config);
//Redis Adapter
  io.adapter(
    redisAdapter({
      host: REDIS_HOST,
      port: REDIS_PORT,
    })
  );

//Support functions....

 io.sockets.on('connection', function (client) {
    logIt('io.sockets.on(connect)', `Will broadcast HELLO ${client.id}`);
    io.emit('HELLO', `from io.emit -- ${client.id}`);
    client.broadcast.emit('HELLO', client.id);

    client.on(USER_CONNECTED, async (user) => {
      logIt('client.on(USER_CONNECTED)', ` ${user.name} ${client.id}`);
      user.socketId = client.id;

      const redisContent= await redisClient.addRedisUser({
        name: user.name,
        id: user.id,
        socketId: client.id,
      });
     io.emit(USER_CONNECTED, user); // broadcast to other clients
     });
     ... 
   //other events
  });
return io; //End of module exports function

The two things that I am wondering about in particular:

  1. In the example code the MASTER is the one listening, but I could not get the static routes under that scenario. I received a web error Cannot GET /. When I shift the app.use content to be configured with the master the page comes up but I never see the socket connections being made and the DB connection times out even though it is initialized. (The "Hello" log from above are never issued.) The one thing missing from the socket.io site example is express.
  2. The second item is that I initiate all the socket.on events prior to doing the setup worker.

Is this a case of a mis-configuration on sequencing? I s there some way to shift my master/worker configuration around such that it is using the sticky functionality and uses express but can also get the worker instances to respond on a client connection? Is my express/https server/socket io implementation misconfigured in some way? Is there a way to get express/https to listen at both the master and the worker level. It would seem to me that while I am connecting through the master it is not being passed to a worker. Thanks Joe

josefpohl commented 3 years ago

I wanted to note that I have a similar set up to #2 . And possibly similar errors. I am running a single EC2 instance behind an ALB. I run this behind some other managed services so I will have to see if NLB is an option.

The thing I noted with that though is that his socket and master was running on a different port than the workers. If I set it up to run master and client on the same port (I tried it a while ago) I get an address in use warning. Any guidance on how to get that configured if that is possibly the error (basically I am not listening on master and hence the "stickiness" is not even given the chance to work?) @sagarjoshi-sct (sorry to tag you.)

josefpohl commented 3 years ago

I did a bit more logging/digging around and tried to run the system (web side) with polling turned on and off on the client.

When polling was turned off the logs were mostly un-notable. After about 9 minutes of idle I got a ping time out which might be interesting, or not.

  socket.io:server creating engine.io instance with opts {"pingTimeout":60000,"pingInterval":25000,"transports":["websocket","polling"],"path":"/socket.io","initialPacket":["0"]} +1ms
...
  engine handshaking client "U0ecrfDnWG47PN4wAAAB" +9m
  socket.io:server incoming connection with id U0ecrfDnWG47PN4wAAAB +9m
  socket.io:client connecting to namespace / +8m
  socket.io:namespace adding socket to nsp / +9m
  socket.io:socket socket connected - writing packet +9m
  socket.io:socket joining room U0ecrfDnWG47PN4wAAAB +1ms
  socket.io:socket packet already sent in initial handshake +0ms
process: 27845, 2021-06-02T20:13:35.569Z, socketManager, io.sockets.on(connect), Will broadcast HELLO U0ecrfDnWG47PN4wAAAB
  socket.io-redis publishing message to channel socket.io#/# +9m
  socket.io-parser encoding packet {"type":2,"data":["HELLO","from io.emit -- U0ecrfDnWG47PN4wAAAB"],"nsp":"/"} +8m
//More io-parser
socket.io:client client close with reason ping timeout +40s
  socket.io:socket closing socket - reason ping timeout +40s
process: 27845, 2021-06-02T20:14:15.116Z, socketManager, client.on(disconnect), user: no user, client ID: gvbdkhLck1jjQjRQAAAA, reason: ping timeout
process: 27845, 2021-06-02T20:14:15.117Z, socketManager, client.on(disconnect), Other reason ping timeout gvbdkhLck1jjQjRQAAAA
// gvbdkhLck1jjQjRQAAAA was the previous sid

It appears though that the polling version is getting a number 400 errors. Here are a few pertinent bits from the log

CLIENT SIDE
SocketInit.js:23 Connected from App.js socket.io-client bSeAsCtGadq1dLqdAAAA at  Wed Jun 02 2021 13:50:30 GMT-0600 (Mountain Daylight Time) Socket
SocketInit.js:34 HELLO from io.emit -- bSeAsCtGadq1dLqdAAAA
websocket.js:112 WebSocket connection to 'wss://test.mycompany.com/socket.io/?EIO=3&transport=websocket&sid=bSeAsCtGadq1dLqdAAAA' failed: 
WS.doOpen @ websocket.js:112
polling-xhr.js:263 POST https://test.mycompany.com/socket.io/?EIO=3&transport=polling&t=NdEHsCV&sid=bSeAsCtGadq1dLqdAAAA 400
//Request, XHR, ... content redacted
polling-xhr.js:263 XHR failed loading: POST "https://test.mycompany.com/socket.io/?EIO=3&transport=polling&t=NdEHsCV&sid=bSeAsCtGadq1dLqdAAAA".
//Request, XHR, ... content redacted
polling-xhr.js:263 POST https://test.mycompany.com/socket.io/?EIO=3&transport=polling&t=NdEHsPJ&sid=bSeAsCtGadq1dLqdAAAA 400
//Request, XHR, ... content redacted
Polling.doClose @ polling.js:180
Transport.close @ transport.js:94
Socket.onClose @ socket.js:708
Socket.onError @ socket.js:686
(anonymous) @ socket.js:276
Emitter.emit @ index.js:133
Transport.onError @ transport.js:67
(anonymous) @ polling-xhr.js:109
Emitter.emit @ index.js:133
Request.onError @ polling-xhr.js:309
(anonymous) @ polling-xhr.js:256
setTimeout (async)
//Request, XHR, ... content redacted
polling-xhr.js:263 XHR failed loading: POST "https://test.mycompany.com/socket.io/?EIO=3&transport=polling&t=NdEHsPJ&sid=bSeAsCtGadq1dLqdAAAA".

and

SERVER SIDE
  socket.io:server initializing namespace / +0ms
  socket.io-parser encoding packet {"type":0,"nsp":"/"} +0ms
  socket.io:server initializing namespace / +0ms
  socket.io-parser encoded {"type":0,"nsp":"/"} as 0 +0ms
  socket.io:server creating engine.io instance with opts {"pingTimeout":60000,"pingInterval":25000,"transports":["websocket","polling"],"path":"/socket.io","initialPacket":["0"]} +2ms
  socket.io-parser encoding packet {"type":0,"nsp":"/"} +0ms
  socket.io-parser encoded {"type":0,"nsp":"/"} as 0 +0ms
  socket.io:server creating engine.io instance with opts {"pingTimeout":60000,"pingInterval":25000,"transports":["websocket","polling"],"path":"/socket.io","initialPacket":["0"]} +1ms
  socket.io:server attaching client serving req handler +8ms
  socket.io:server attaching client serving req handler +12ms
  engine intercepting request for path "/socket.io/" +0ms
  engine handling "GET" http request "/socket.io/?EIO=3&transport=polling&t=NdEHlYO" +1ms
  engine handshaking client "9n3HHN46t_jkiK21AAAA" +1ms
  socket.io:server incoming connection with id 9n3HHN46t_jkiK21AAAA +43s
  socket.io:client connecting to namespace / +0ms
  socket.io:namespace adding socket to nsp / +0ms
  socket.io:socket socket connected - writing packet +0ms
  socket.io:socket joining room 9n3HHN46t_jkiK21AAAA +0ms
  socket.io:socket packet already sent in initial handshake +0ms
process: 25619, 2021-06-02T19:50:26.873Z, socketManager, io.sockets.on(connect), Will broadcast HELLO 9n3HHN46t_jkiK21AAAA
  socket.io-redis publishing message to channel socket.io#/# +0ms
  socket.io-parser encoding packet {"type":2,"data":["HELLO","from io.emit -- 9n3HHN46t_jkiK21AAAA"],"nsp":"/"} +44s
  socket.io-parser encoded {"type":2,"data":["HELLO","from io.emit -- 9n3HHN46t_jkiK21AAAA"],"nsp":"/"} as 2["HELLO","from io.emit -- 9n3HHN46t_jkiK21AAAA"] +0ms
  socket.io:client writing packet ["2[\"HELLO\",\"from io.emit -- 9n3HHN46t_jkiK21AAAA\"]"] +5ms
  socket.io-redis publishing message to channel socket.io#/# +2ms
  socket.io-parser encoding packet {"type":2,"data":["HELLO","9n3HHN46t_jkiK21AAAA"],"nsp":"/"} +1ms
 //socket.io-parser content redacted
  socket.io:socket joined room [ '9n3HHN46t_jkiK21AAAA' ] +5ms
  socket.io-redis ignore same uid +4ms
  socket.io-redis ignore same uid +0ms
  engine intercepting request for path "/socket.io/" +239ms
  engine handling "GET" http request "/socket.io/?EIO=3&transport=polling&t=NdEHlc4&sid=9n3HHN46t_jkiK21AAAA" +1ms
  engine setting new request for existing client +0ms
  engine upgrading existing transport +115ms
  engine intercepting request for path "/socket.io/" +0ms
  engine handling "GET" http request "/socket.io/?EIO=3&transport=polling&t=NdEHlfM&sid=9n3HHN46t_jkiK21AAAA" +1ms
  engine intercepting request for path "/socket.io/" +355ms
  engine handling "POST" http request "/socket.io/?EIO=3&transport=polling&t=NdEHlkV&sid=9n3HHN46t_jkiK21AAAA" +0ms
  engine intercepting request for path "/socket.io/" +1s
  engine handling "GET" http request "/socket.io/?EIO=3&transport=polling&t=NdEHm4K" +0ms
  engine handshaking client "bSeAsCtGadq1dLqdAAAA" +1ms
  socket.io:server incoming connection with id bSeAsCtGadq1dLqdAAAA +46s
  socket.io:client connecting to namespace / +0ms
  socket.io:namespace adding socket to nsp / +0ms
  socket.io:socket socket connected - writing packet +0ms
  socket.io:socket joining room bSeAsCtGadq1dLqdAAAA +0ms
  socket.io:socket packet already sent in initial handshake +1ms
process: 25613, 2021-06-02T19:50:29.043Z, socketManager, io.sockets.on(connect), Will broadcast HELLO bSeAsCtGadq1dLqdAAAA
  socket.io-redis publishing message to channel socket.io#/# +0ms
  socket.io-parser encoding packet {"type":2,"data":["HELLO","from io.emit -- bSeAsCtGadq1dLqdAAAA"],"nsp":"/"} +2s
 //socket.io-parser content redacted
  socket.io:client writing packet ["2[\"HELLO\",\"bSeAsCtGadq1dLqdAAAA\"]"] +1ms
  socket.io:socket joined room [ 'bSeAsCtGadq1dLqdAAAA' ] +4ms
  socket.io-redis ignore same uid +3ms
  socket.io-redis ignore same uid +0ms
  engine intercepting request for path "/socket.io/" +62ms
  engine handling "GET" http request "/socket.io/?EIO=3&transport=polling&t=NdEHm5K&sid=bSeAsCtGadq1dLqdAAAA" +0ms
  engine setting new request for existing client +0ms
  engine intercepting request for path "/socket.io/" +99ms
  engine handling "GET" http request "/socket.io/?EIO=3&transport=polling&t=NdEHm6Q&sid=bSeAsCtGadq1dLqdAAAA" +0ms
  engine setting new request for existing client +0ms
  engine intercepting request for path "/socket.io/" +27s
  engine handling "POST" http request "/socket.io/?EIO=3&transport=polling&t=NdEHsCV&sid=bSeAsCtGadq1dLqdAAAA" +0ms
  engine intercepting request for path "/socket.io/" +811ms
  engine handling "POST" http request "/socket.io/?EIO=3&transport=polling&t=NdEHsPJ&sid=bSeAsCtGadq1dLqdAAAA" +0ms
  engine intercepting request for path "/socket.io/" +700ms
  engine handling "GET" http request "/socket.io/?EIO=3&transport=polling&t=NdEHsaF" +0ms
  engine handshaking client "JGLy99Uab2J4BeVYAAAB" +1ms
  socket.io:server incoming connection with id JGLy99Uab2J4BeVYAAAB +29s
  socket.io:client connecting to namespace / +27s
  socket.io:namespace adding socket to nsp / +29s
  socket.io:socket socket connected - writing packet +29s
  socket.io:socket joining room JGLy99Uab2J4BeVYAAAB +0ms
  socket.io:socket packet already sent in initial handshake +0ms
process: 25619, 2021-06-02T19:50:55.657Z, socketManager, io.sockets.on(connect), Will broadcast HELLO JGLy99Uab2J4BeVYAAAB
  socket.io-redis publishing message to channel socket.io#/# +29s
  socket.io-parser encoding packet {"type":2,"data":["HELLO","from io.emit -- JGLy99Uab2J4BeVYAAAB"],"nsp":"/"} +27s
 //socket.io-parser content redacted
  socket.io:socket joined room [ 'JGLy99Uab2J4BeVYAAAB' ] +1ms
  socket.io-parser encoding packet {"type":2,"data":["HELLO","JGLy99Uab2J4BeVYAAAB"],"nsp":"/"} +0ms
  socket.io-parser encoded {"type":2,"data":["HELLO","JGLy99Uab2J4BeVYAAAB"],"nsp":"/"} as 2["HELLO","JGLy99Uab2J4BeVYAAAB"] +1ms
  socket.io-parser encoded {"type":2,"data":["HELLO","JGLy99Uab2J4BeVYAAAB"],"nsp":"/"} as 2["HELLO","JGLy99Uab2J4BeVYAAAB"] +1ms
  socket.io-redis ignore same uid +1ms
  socket.io-parser encoding packet {"type":2,"data":["HELLO","JGLy99Uab2J4BeVYAAAB"],"nsp":"/"} +1ms
  socket.io-parser encoded {"type":2,"data":["HELLO","JGLy99Uab2J4BeVYAAAB"],"nsp":"/"} as 2["HELLO","JGLy99Uab2J4BeVYAAAB"] +0ms
  socket.io-redis ignore same uid +0ms
  socket.io:client writing packet ["2[\"HELLO\",\"JGLy99Uab2J4BeVYAAAB\"]"] +1ms
  engine intercepting request for path "/socket.io/" +875ms
  engine handling "GET" http request "/socket.io/?EIO=3&transport=polling&t=NdEHsnC&sid=JGLy99Uab2J4BeVYAAAB" +0ms
  engine setting new request for existing client +0ms
  engine intercepting request for path "/socket.io/" +902ms
  engine handling "GET" http request "/socket.io/?EIO=3&transport=polling&t=NdEHt01&sid=JGLy99Uab2J4BeVYAAAB" +0ms
  engine setting new request for existing client +0ms

I checked my cors configuration for the server (I don't explicitly state anything for the socket.io side since all the requests should be coming for my domain... in theory...) But curl "https://test.MYDOMAIN.com/socket.io/?EIO=3&transport=polling" % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 104 100 104 0 0 214 0 --:--:-- --:--:-- --:--:-- 22997:0 {"sid":"VFKicNYjHnFDlYThAAAI","upgrades":["websocket"],"pingInterval":25000,"pingTimeout":60000}2:40

Seemed to be OK.

I will dig into other reasons behind the 400 errors. But if there is anything that I am obviously doing wrong please let me know.

Thanks again!

darrachequesne commented 3 years ago

You shouldn't call serverhttp.listen() in your worker process. Here's what happens:

The HTTP server of the master process handles all the incoming requests: https://github.com/socketio/socket.io-sticky/blob/a367d1b619728eb4d5d1da5e59652a2c33f9107d/index.js#L59-L74

And then forwards the handle to the right worker, which manually inject the connection in its own HTTP server (which is not actually listening): https://github.com/socketio/socket.io-sticky/blob/a367d1b619728eb4d5d1da5e59652a2c33f9107d/index.js#L102-L110

Usage with Express should be quite straightforward:

const cluster = require("cluster");
const http = require("http");
const { Server } = require("socket.io");
const redisAdapter = require("socket.io-redis");
const numCPUs = require("os").cpus().length;
const { setupMaster, setupWorker } = require("@socket.io/sticky");

if (cluster.isMaster) {
  console.log(`Master ${process.pid} is running`);

  const httpServer = http.createServer();
  setupMaster(httpServer, {
    loadBalancingMethod: "least-connection", // either "random", "round-robin" or "least-connection"
  });
  httpServer.listen(3000);

  for (let i = 0; i < numCPUs; i++) {
    cluster.fork();
  }

  cluster.on("exit", (worker) => {
    console.log(`Worker ${worker.process.pid} died`);
    cluster.fork();
  });
} else {
  console.log(`Worker ${process.pid} started`);

- const httpServer = http.createServer();
+ const app = require("express")();
+ const httpServer = http.createServer(app);
  const io = new Server(httpServer);
  io.adapter(redisAdapter({ host: "localhost", port: 6379 }));
  setupWorker(io);

  io.on("connection", (socket) => {
    console.log("connect", socket.id);
  });

+ app.get("/test", (req, res) => {
+   res.send("OK");
+ });
}
josefpohl commented 3 years ago

@darrachequesne, thank you so much that worked so much better and was seemingly much more stable.

When Polling is on with the web client I am still seeing the 400 response (client side) and on the server side a rapid swapping of ping timeout or transport close.
In one example it rotated through about 4 connections before stabilizing. The websocket connection seems to fail and then I receive a

Websocket connection to wss://test.mydomain.com/socket.io/?EIO=3&transport=websocket... failed
GET https://test.mydomain.com/socket.io/?EIO=3&transport=polling.... 400
POST https://test.mydomain.com/socket.io/?EIO=3&transport=polling.... 400

Sometimes I get both. Sometimes the wss error is missing. Sometimes I just get the GET or the POST.

Could there be something amiss with my connection configuration? Server side

 const config = {
    pingTimeout: 60000,
    pingInterval: 25000,
    transports: ['polling', 'websocket'],
  };

Client side

{
    reconnection: true,
    reconnectionAttempts: 500,
    reconnectionDelayMax: 5000,
    reconnectionDelay: 1000,
    transports: ['polling','websocket'],
  }

And I still call my socketManager code (which initializes the socket etc...) before calling setupWorker(io). I can separate them.

const io = socketManager(serverhttp, redisCache); // initialize socket
  setupWorker(io);

Thoughts. Again, I truly appreciate your answer above.

darrachequesne commented 3 years ago

Hmm, your configuration looks good to me.

Unfortunately, I am unable to reproduce the HTTP 400 errors, I've created a running example: https://github.com/socketio/socket.io-fiddle/tree/sticky

Could it be linked to your HTTPS setup?

josefpohl commented 3 years ago

Ok, so I must have messed up something in my deployment so I am going to have to rescind my earlier statement about it working (not about my gratitude). I went to play around with the https setup this morning and started getting a 504 Gateway Timeout error. I have put a small (https) server with a few other modifications (had to use socket.io 2.1.1 syntax) and continued to get the same error (I did try the http version but my ALB blocks all traffic).

So I tried your "exact" implementation. When I added in the https configuration I continued to get the Gateway Timeout error. If I switched it to http and ran curl on localhost I did get the proper response. Here is the http version.

const cluster = require("cluster");
const http = require("http");
const https = require("https");
const socketio = require("socket.io");
const redisAdapter = require("socket.io-redis");
const fs = require('fs');
const numCPUs = require("os").cpus().length;
const { setupMaster, setupWorker } = require("@socket.io/sticky");

let options = {};
if (process.env.NODE_ENV === 'production') {
  var key = fs.readFileSync(__dirname + '/../certs/selfsigned.key');
  var cert = fs.readFileSync(__dirname + '/../certs/selfsigned.crt');
  options = {
    key: key,
    cert: cert,
  };
}
if (cluster.isMaster) {
  console.log(`Master ${process.pid} is running`);

  const httpServer = http.createServer();//options);
  setupMaster(httpServer, {
    loadBalancingMethod: "least-connection", // either "random", "round-robin" or "least-connection"
  });
  httpServer.listen(5001);

  for (let i = 0; i < numCPUs; i++) {
    cluster.fork();
  }

  cluster.on("exit", (worker) => {
    console.log(`Worker ${worker.process.pid} died`);
    cluster.fork();
  });
} else {
  console.log(`Worker ${process.pid} started`);

  const app = require("express")();
  const httpServer = http.createServer(app);//options, app);
  const io = socketio.listen(httpServer);
  io.adapter(redisAdapter({ host: "localhost", port: 6379 }));
  setupWorker(io);

  io.on("connection", (socket) => {
    console.log("connect", socket.id);
  });

  app.get("/test", (req, res) => { res.send("OK"); });
}

However in this version I was getting the following error on my console:

internal/per_context/primordials.js:23
  return (thisArg, ...args) => ReflectApply(func, thisArg, args);
                               ^

TypeError: Cannot convert undefined or null to object
    at hasOwnProperty (<anonymous>)
    at internal/per_context/primordials.js:23:32
    at getOrSetAsyncId (internal/async_hooks.js:395:7)
    at Server.connectionListener (_http_server.js:398:5)
    at Server.emit (events.js:314:20)
    at process.<anonymous> (/home/MYAPP/node_modules/@socket.io/sticky/index.js:102:23)
    at process.emit (events.js:326:22)
    at emit (internal/child_process.js:906:12)
    at processTicksAndRejections (internal/process/task_queues.js:81:21)
Worker 3988 died

I did not get this error in the https version. What I did see there (when I did the curl call locally) was that it had an issue with the self-signed certificate. If I did curl --insecure "https://localhost:5001/test" The command line just hung (no time out). But without insecure I got: curl: (60) SSL certificate problem: self signed certificate But I can revert back to the original and have no issues with the certificate.

I wonder if the error is due to the 2.1.1. But the https issue is the bigger concern for me.
I ran the fiddle example (modified) with the https single instance and did not have a problem with --insecure but got the same error without that flag. So maybe it is an issue with my certificates. But it seems odd that I could run it with those certificates with my "faulty" configuration.
Any thoughts Thanks, Josef

josefpohl commented 3 years ago

And for what it is worth, the single server version with the https did work externally and did not get the Gateway Timeout error.

And if I took out the "options" (the certificates) from the master createServer in the master/client version I got a Bad Gateway, which is not unexpected... but thought I would try. Thanks again! Joe

josefpohl commented 3 years ago

So, I have tried a variety of things. (just to make sure it was not a 2.1.1 issue I upgraded a version to 3.X) Whenever I do NOT have a listener in the worker thread (whether I am using sticky or not) I get this Timeout. But from your explanation above this is the way it should work, the master is the pass-through. But here is perhaps my mis-understanding and if so I apologize for wasting precious time. The listener is only for the socket connections. And for what ever reason I was presuming that it would work for the other app/express requests as well. So, if this is the case I apologize. (I am tired, overworked, and have a three year old who sleeps less than I do... but we could all say that I suppose. Well, maybe except for the 3 year old.)

Any thoughts on a library/structure that would maintain stickiness on the socket for polling and allow for listening to the express traffic? Thanks again! Josef

josefpohl commented 3 years ago

Also, running the sticky fiddle locally under http alone seems to work just fine. (so my pass through theory was wrong...)

darrachequesne commented 3 years ago

Arf, I could indeed reproduce the issue with an HTTPS server: https://github.com/socketio/socket.io-fiddle/tree/sticky-https

curl -k "https://localhost:3000/socket.io/" hangs forever, it seems the data event is not emitted here: https://github.com/socketio/socket.io-sticky/blob/a367d1b619728eb4d5d1da5e59652a2c33f9107d/index.js#L60

Note: during my testing, I also encountered an exception with a plain HTTP server:

internal/async_hooks.js:330
  if (object.hasOwnProperty(async_id_symbol)) {
             ^

TypeError: Cannot read property 'hasOwnProperty' of undefined
    at getOrSetAsyncId (internal/async_hooks.js:330:14)
    at Server.connectionListener (_http_server.js:397:5)
    at Server.emit (events.js:314:20)
    at process.<anonymous> (.../node_modules/@socket.io/sticky/index.js:102:23)
    at process.emit (events.js:326:22)
    at emit (internal/child_process.js:876:12)
    at processTicksAndRejections (internal/process/task_queues.js:85:21)
$ curl -k "https://localhost:3000/socket.io/"
curl: (35) error:1408F10B:SSL routines:ssl3_get_record:wrong version number
$ node -v
v12.19.0

Let's dig into this...

josefpohl commented 3 years ago

I ran into the "primoridials" issue I referenced above on a plain http server running on my AWS system. The phrasing and source was slightly different. I can see if I can recreate it again if you would like.
I am using node 14.

$ node -v
v14.8.0

The https version I was using last night was created from a modified version of your sticky fiddle. Mine ended up looking almost identical to the version you just posted.

I had my compliance and AWS asset group dig into my network configuration last night but they didn't un-cover anything notably wrong with the configuration.

josefpohl commented 3 years ago

@darrachequesne, just thought I would check in with you on this. What can I do to assist? Thanks!

darrachequesne commented 3 years ago

According to this and this, one can't directly send a TLSSocket object from the master process to the worker process, but creating a basic net.Socket and piping the TLSSocket stream into it should work.

I wasn't able to achieve it though:

const fs = require("fs");
const { Socket } = require("net");

const httpsServer = require("https").createServer({
  key: fs.readFileSync("./key.pem"),
  cert: fs.readFileSync("./cert.pem"),
});

const httpServer = require("http").createServer((req, res) => {
  // never called
});

httpsServer.on("secureConnection", (socket) => {
  const plainSocket = new Socket();
  socket.pipe(plainSocket);

  httpServer.emit("connection", plainSocket);
});

httpsServer.listen(3000);
josefpohl commented 3 years ago

It didn't work in my setup either. There did not seem to be any discernible difference in the outcome (Time out).

Probably a very naive question but is "plainSocket" opened by default? (Sorry, I am a bit out of my element here.) Thanks, Josef

darrachequesne commented 3 years ago

Unfortunately, I was not able to find a solution to this. If anyone has a suggestion, please ping me.

josefpohl commented 3 years ago

My apologies for the long delay on responding to this. Thanks @darrachequesne I appreciate you looking into it. Feel free to close the issue (or I can).

darrachequesne commented 1 year ago

Update: it seems we could reuse the logic there: https://github.com/coder/code-server/blob/main/src/node/socket.ts