timeout but failed to retry until restarting program

tengattack commented 6 years ago

Hi, I am using memjs 1.2.0 with memcache. However it sometimes go into fail state for every request until I restart program:

error: socket timed out connecting to server.
 at Socket.<anonymous> (/path/node_modules/memjs/lib/memjs/server.js:169:20)
 at Object.onceWrapper (events.js:272:13)
 at Socket.emit (events.js:180:13)
 at Socket.emit (domain.js:421:20)
 at Socket._onTimeout (net.js:396:8)
 at ontimeout (timers.js:466:11)
 at tryOnTimeout (timers.js:304:5)
 at Timer.listOnTimeout (timers.js:267:5)

Node v9.9.0

kewitz commented 5 years ago

I'm experiencing the same problem on Heroku.

znarf commented 5 years ago

We're currently experiencing that too. It's an important issue for us at Open Collective.

UmanShahzad commented 5 years ago

Having this issue too. It seems once a node gets into this state, it cannot get out. I'll look into solving this because it's been bothering us a lot.

saschat commented 5 years ago

@UmanShahzad any help is appreciated. I have been unable to reproduce this issue so also a pointer on how to reproduce this error would help a lot.

UmanShahzad commented 5 years ago

@saschat Sure, I'll be running through it soon. Just seems like there's a race condition happening around the connection timeout logic that causes it to go in an infinite loop or something of thinking the socket isn't connected.

idungotnosn commented 4 years ago

I'm still seeing this issue in 1.2.2. I have an ExpressJS server serving up requests using the NextJS engine. It frequently grabs CMS page metadata from Memcache.

When it starts getting socket timeout errors, the server goes VERY slow because it retries multiple times and then fails. The only way to fix the issue is to do a rolling restart.

I thought initially that memcache was erroring out, but if that were the case, why would a rolling restart immediately remedy the problem?

This happens frequently enough in production that I have logging software that alerts me whenever it happens.

The server I use has a node version of 12.6.1.

idungotnosn commented 4 years ago

I did some further investigation. I think that memJS erroring out may be caused by the server running out of memory. You can't open a socket with memcache without some extra memory lying around. This leads me to believe that this is not an issue with memjs.

saschat commented 4 years ago

I tried to reproduce this error again and finally realized that the posted error is an exception that was not caught. Our examples are with callbacks where errors are implicitly handled so I missed that. If you do not use callbacks you need to catch this error with .catch for promises or a try/catch block for async/await and handle the error by getting the info from the ground truth. This is normal caching behavior.

saschat commented 4 years ago

This looks as follows with callbacks:

var mc = memjs.Client.create(/* ... */);
mc.get("test-key", function (err, val) {
  if (err != null) {
    // Error (e.g., the one you see) -> get value from source
  } else {
    if (val == null) {
      // Miss -> get value from source
    } else {
      // Return value
    }
  }
});

And with promises:

var mc = memjs.Client.create(/* ... */);
mc.get("test-key")
  .then(function (val) {
    if (val == null) {
      // Miss -> get value from source
    } else {
      // Return value
    }
  })
  .catch(function (err) {
    // Error (e.g., the one you see) -> get value from source
  });

kewitz commented 4 years ago

cc @invisible-tech @keenahn This was one of the main reasons why we migrated our main services to Redis, make sure you read this if you're still using memjs somewhere else.

znarf commented 4 years ago

If I'm remembering well, this was not just a matter of try/catch, the problem was that the library would be stuck and fail to reconnect after losing connection to the server.

tengattack commented 4 years ago

This looks as follows with callbacks:

var mc = memjs.Client.create(/* ... */);
mc.get("test-key", function (err, val) {
  if (err != null) {
    // Error (e.g., the one you see) -> get value from source
  } else {
    if (val == null) {
      // Miss -> get value from source
    } else {
      // Return value
    }
  }
});

...

@saschat Will this problem happened when the uncaught error inside the callback function? like:

mc.get("test-key", function (err, val) {
  throw new Error(...)
})

saschat commented 4 years ago

@tengattack Then you will see the error you throw. So you will see this error if you do:

mc.get("test-key", function (err, val) {
  if (err) {
    throw err;
  }
});

saschat commented 4 years ago

@znarf So the issue description clearly shows an exception stack trace, one that you will not see if you try/catch the error (which should be done). So a failure to reconnect would then be a different issue, unrelated to this issue. Are you saying that after implemented the PR mentioned above you still had trouble reconnecting?

tengattack commented 4 years ago

@tengattack Then you will see the error you throw. So you will see this error if you do:
mc.get("test-key", function (err, val) {
  if (err) {
    throw err;
  }
});

Could we add an additional try catch wraps the callback function in memjs to fix this problem?

saschat commented 4 years ago

@tengattack sure. If you throw an error that can occur during normal operation you should always catch it. Uncaught errors should only ever be used for unrecoverable situations, some even say never ;-P.

tengattack commented 4 years ago

@saschat Thank you! I'm looking forward to the next release :)

saschat commented 4 years ago

@tengattack Sorry, this was a misunderstanding. What I meant is that you need to catch all errors that you throw in the callback.

To be clear: We will not catch this error in memjs. It is the job of the library to surface errors and the job of the library user (you) to catch and handle errors. In this case you can print or ignore it if you will, or treat it as a miss for certain commands. memjs will just continue to work as expected and reconnect automatically when it can.

tengattack commented 4 years ago

@tengattack Sorry, this was a misunderstanding. What I meant is that you need to catch all errors that you throw in the callback.

To be clear: We will not catch this error in memjs. It is the job of the library to surface errors and the job of the library user (you) to catch and handle errors. In this case you can print or ignore it if you will, or treat it as a miss for certain commands. memjs will just continue to work as expected and reconnect automatically when it can.

@saschat I mean the callback maybe we can use setImmediate or some other functions to call the user-level callbacks so that it will not break the main logic for the retry or reconnect workflows in the library.

saschat commented 4 years ago

@tengattack I am sorry, I don't understand. You supply the callback so you can do whatever you want in there. Personally, I would ignore set failures and treat get failures as misses. This should not break any logic of your application.

tengattack commented 4 years ago

@tengattack I am sorry, I don't understand. You supply the callback so you can do whatever you want in there. Personally, I would ignore set failures and treat get failures as misses. This should not break any logic of your application.

@saschat I mean the library can use some thing like event emitter to trigger the callback, so that it will not break the library's logic even if there are exception throw in the callback.

saschat commented 4 years ago

@tengattack Since you control the callback, why do you throw exceptions if they break your logic? I am sorry, I really don't understand what you are doing.

tengattack commented 4 years ago

@saschat Of course we can control the callback, but I think we should do the callback in event emitter instead of binding the callback function inside the main logic which may break the library's state.

saschat commented 4 years ago

@tengattack I finally understand what you mean :-)

This would be an option (we actually use event emitters internally) but I don't think this would be ideal since it would remove choice for the user. For example lets consider a get request. It can return a hit, a miss, or an error. If I use an event emitter for errors a get only returns a hit or a miss. This means as a library developer I need to decide for you how to treat this error. Now in a get request this might be easy (I would treat it as a miss), but what about a set or an incr command.

So to summarize, an event emitter would basically mean that as the library developer I would have to decide how to treat all errors for the different commands and as a user you only get to react to errors in one way, independently which command they occurred with. The second might not be too mad but the former I am not comfortable doing.

tengattack commented 4 years ago

@saschat Thank you for your patience. I think I still didn't make myself clear, let me use some code to clearify what I want to do:

As you said above,

@tengattack Then you will see the error you throw. So you will see this error if you do:
mc.get("test-key", function (err, val) {
  if (err) {
    throw err;
  }
});

The library's state simply breaks when the error handling didn't finished. I want the library can still functional normally at this situation.

For example, we can do like this:

originalGet = mc.get
mc.get = function (key, cb) {
  originalGet.call(mc, key, function (err, val) {
    setTimeout(function () { cb(err, val) }, 0)
  })
})

I wrap the callback happen in setTimeout so that it will not get error even there are exception inside user-level callback function. If I'm right, the library will still keep functional. So, could the library do this instead of using the money patch way.

saschat commented 4 years ago

@tengattack this seems convoluted to me. Why do you raise an exception in your callback if you don't want an error to be raised? Just don't raise an exception in the callback in the first place, there is no reason to do that. Exceptions should only be raised if you catch them on a higher level or if you want your app to fail.

For a get request I would just do something along the lines of:

mc.get("test-key", function (err, val) {
  if (err || val == null){
    // an error or a miss 
    val = ... // get value from source
  }
  // do something with value
});

tengattack commented 4 years ago

@tengattack this seems convoluted to me. Why do you raise an exception in your callback if you don't want an error to be raised? Just don't raise an exception in the callback in the first place, there is no reason to do that. Exceptions should only be raised if you catch them on a higher level or if you want your app to fail.

For a get request I would just do something along the lines of:
mc.get("test-key", function (err, val) {
  if (err || val == null){
    // an error or a miss 
    val = ... // get value from source
  }
  // do something with value
});

@saschat I think it just to increase the library's stability. Sometimes the exception may from another library that called in here which we may not intend to do it.

saschat commented 4 years ago

@tengattack This has nothing to do with the stability of memjs. If you have functions that throw exceptions in you callback you need to surround them with a try/catch. As a library developer I cannot anticipate all that happens in a callback and more importantly I shouldn't. Trying to do so has a cost and potential side effects.

I understand that dealing with exceptions is annoying since they are invisible. Personally, I prefer languages that force an explicit handling of errors, like Rust.

memcachier / memjs

timeout but failed to retry until restarting program #117