Intermittent lost connection to Azure SQL

tysjiang commented 9 years ago

I've been getting this error intermittently with tedious:

Fri Aug 14 2015 05:29:16 GMT+0000 (Coordinated Universal Time): Unaught exception: ConnectionError: Connection lost - read ECONNRESET at Connection.socketError (D:\home\site\wwwroot\node_modules\tedious\lib\connection.js:797:26) at Socket. (D:\home\site\wwwroot\node_modules\tedious\lib\connection.js:33:15) at Socket.emit (events.js:117:20) at net.js:441:14 at process._tickCallback (node.js:442:13)

The error stalls the entire node server and renders it unresponsive for around 10-15 minutes. Afterwards, the server recovers by itself automatically.

I'm on the latest tedious 1.12.2 and using it with the tedious-connection-pool module 0.3.8 (although the issue is present prior to tedious 1.12.2 and has been happening in all the versions for the past few months)

What is interesting is that it seems like the issue happens much more frequently these days than before.

I'm not sure if the above error message is helpful or not but if it's not, please inform me what other debug or log information I can get you good folks. I've been testing out tedious and besides this particular showstopper issue for me, everything looks great.

As for servers, I'm using node 32bit v0.10.36 (although I've tested this with the latest node 12 and the issue is present there too) connecting to Azure SQL (v12 DB)

palakautomation commented 9 years ago

same here.

Application has thrown an uncaught exception and is terminated:
ConnectionError: Connection lost - read ECONNRESET
    at Connection.socketError (D:\home\site\wwwroot\node_modules\mssql\node_modules\tedious\lib\connection.js:797:26)
    at Socket.<anonymous> (D:\home\site\wwwroot\node_modules\mssql\node_modules\tedious\lib\connection.js:33:15)
    at Socket.emit (events.js:95:17)
    at net.js:440:14
    at process._tickCallback (node.js:419:13)

arthurschreiber commented 9 years ago

I'm not using Azure SQL and thus can not really help much on this topic without a lot more information on this. You could enable network logging and write all data out to a file so I could have a look at the data that is sent over by the server.

Reubend commented 9 years ago

I'm having this same issue. The exact data returned is

{ [ConnectionError: Connection is closed.]
      name: 'ConnectionError',
      message: 'Connection is closed.',
      code: 'ECONNCLOSED' }

I'm using Tedious through the node-mssql module, and so I normally have several Tedious connections open at the same time. I can try to get more info if you'd like, although I don't really know what I'm doing with this type of stuff. I'm not sure how to enable network logging; do you mean from the SQL Server?

dmiddlecamp commented 9 years ago

I seem to be hitting this while testing a nodejs app on "Azure Web Apps"

[ConnectionError: Connection lost - read ECONNRESET]
  name: 'ConnectionError',
  message: 'Connection lost - read ECONNRESET',
  code: 'ESOCKET' } ConnectionError: Connection lost - read ECONNRESET
    at Connection.socketError (D:\home\site\wwwroot\node_modules\tedious\lib\connection.js:795:26)
    at Socket.<anonymous> (D:\home\site\wwwroot\node_modules\tedious\lib\connection.js:33:15)
    at emitOne (events.js:77:13)
    at Socket.emit (events.js:169:7)
    at emitErrorNT (net.js:1250:8)
    at doNTCallback2 (node.js:429:9)
    at process._tickCallback (node.js:343:17)

I'm using tedious version ~1.12.3, and mssql version 2.3.1

(doesn't matter what bit / version of node I use, I've tried a ton)

edit: I thought this was preventing me from using the database, turns out I was doing something silly, and I accidentally removed the options: { encrypt: true } setting!

Reubend commented 9 years ago

Yeah, after updating, I'm still experiencing this issue. My setup is almost exactly the same, with an Azure SQL DB connected to an Azure Web App.

I seem to be hitting this while testing a nodejs app on "Azure Web Apps"

|[ConnectionError: Connection lost - read ECONNRESET] name: 'ConnectionError', message: 'Connection lost - read ECONNRESET', code: 'ESOCKET' } ConnectionError: Connection lost - read ECONNRESET at Connection.socketError (D:\home\site\wwwroot\node_modules\tedious\lib\connection.js:795:26) at Socket. (D:\home\site\wwwroot\node_modules\tedious\lib\connection.js:33:15) at emitOne (events.js:77:13) at Socket.emit (events.js:169:7) at emitErrorNT (net.js:1250:8) at doNTCallback2 (node.js:429:9) at process._tickCallback (node.js:343:17) |

I'm using tedious version ~1.12.3, and mssql version 2.3.1

— Reply to this email directly or view it on GitHub https://github.com/pekim/tedious/issues/300#issuecomment-142976236.

lawrips commented 9 years ago

Here's info I've learned from running in production - sharing here in the hope it's helpful.

There's a load balancer in Azure that has a default timeout of 4mins and will shut off idle connections. For other socket based services (e.g. mongo, imap, etc), I've had to set the keep-alive ping to be shorter than the 4 min mark (its ideal when the library supports this in the connection options as with mongo). So if this is the issue, one option is to implement that in tedious.

Another option worth investigating is this - https://azure.microsoft.com/en-us/blog/new-configurable-idle-timeout-for-azure-load-balancer/ although have not tried this myself (and not sure if it's related to sql).

Another option is to manage retries and reconnects in code. Here's a good article explaining some of the differences between azure sql vs sql and a recommended approach in c# which could be borrowed from:

http://peterkellner.net/2011/01/21/sqlazure-connection-retry-problem-using-best-practices/

arthurschreiber commented 9 years ago

@lawrips That's quite interesting! Adding a keep-alive ping to tedious would be quite simple, as node.js exposes this on all sockets. I'll see if we can add this as on option for the next version of tedious.

lawrips commented 9 years ago

great ! love tedious - think this will be a nice addition

christopheranderson commented 9 years ago

:+1: Let me know if you need any help with this @arthurschreiber. I've definitely seen some issues with this.

ashelley commented 9 years ago

Hello,

I'm also dealing with this issue. In order to facilitate retry logic when a socket error occurs I needed to add a patch to tedious to ensure that a callback gets called when a request is in flight.

I'm still working on testing this patch but without it, the request callback is never called and your application has to implement its own state on the connection to ensure that your request callback gets called when a socket error occurs.

Note that this patch does not yet work with tedious-connection-pool because the connection pool closes the connection (changes the drivers state) before the driver can route to the proper state to call the request's callback. The patch can be reviewed here:

https://github.com/pekim/tedious/issues/324

arthurschreiber commented 9 years ago

@lawrips I just checked the tedious code, and we already enable socket keepAlive, the same way e.g. mongodb drivers enable it. So a missing keepAlive can not be the cause of your lost connection issues.

Reubend commented 8 years ago

This is still occurring for me. I'm going to see if I can avoid the error by using a connection pool that swaps the connection every 30 seconds.

jtmilne commented 8 years ago

Please report back if that helps. I'm also having the same issue.

Reubend commented 8 years ago

Unfortunately, that didn't help. Even with frequent swapping of the connections, I still got the error. This leads me to believe that it has nothing to do with a timeout.

lindo-jmm commented 8 years ago

We're having a pretty hard time with this as well.

christopheranderson commented 8 years ago

Just a PSA, I get this message when I screw up something in my config/connection string. Not saying that's what happening, but thought it was worth mentioning.

lindo-jmm commented 8 years ago

Is it intermittent or do you get it immediately?

From: Christopher Anderson Reply-To: pekim/tedious Date: Tuesday, January 19, 2016 at 2:35 PM To: pekim/tedious Cc: Sean Lindo Subject: Re: [tedious] Intermittent lost connection to Azure SQL (#300)

Just a PSA, I get this message when I screw up something in my config/connection string. Not saying that's what happening, but thought it was worth mentioning.

— Reply to this email directly or view it on GitHubhttps://github.com/pekim/tedious/issues/300#issuecomment-172961338.

christopheranderson commented 8 years ago

Immediately, since it was broken. But lots of people searching for the error message find this issue.

Reubend commented 8 years ago

I don't think that's what's going on for me because the connection initially works and I can make requests through it for a while. This error only appears after some time has passed (in my case).

christopheranderson commented 8 years ago

If you restart the app, does it work again? My first guess would be your running out of available connections to the SQL DB.

ashelley commented 8 years ago

@Reubend I think you already mentioned that you tried this but just wanted to reiterate the fact that you might need to ensure idle timeout on the connection pool is set to less than 4 minutes:

https://github.com/pekim/tedious-connection-pool#new-connectionpoolpoolconfig-connectionconfig

Reubend commented 8 years ago

Yes, if I restart the app, it works fine. I have the idle timeout set to only 30 seconds.

ashelley commented 8 years ago

I haven't yet implemented this myself in node but i wonder if you are hitting azure sql transient errors:

https://azure.microsoft.com/en-us/documentation/articles/sql-database-develop-csharp-retry-windows/

Reubend commented 8 years ago

I haven't written any code to deal with transient errors. That could definitely be the issue. However, I think it would be weird if my test DB has that many transient errors since I'm the only user sending requests.

ashelley commented 8 years ago

Transient errors on azure are typically "random" and not correlated to the number of requests. I've seen them happen as low as one time per day but sometimes they can occur more frequently. If your app still gets these errors if you don't have any connection pooling and create a new database connection on each request I would look at the transient error problem. In other words it might be worth trying to reproduce your problem without reusing any sockets.

Reubend commented 8 years ago

Interesting. I'll write up some code later tonight to see if it's indeed a transient error issue.

sashasochka commented 8 years ago

This is the first time I'm trying to connect node.js to mssql on Azure App Service and haven't had success. I'm always getting ConnectionError: Connection lost - read ECONNRESET at ConnectionError (D:\home\site\wwwroot\node_modules\tedious\lib\errors.js:21:12) at Connection.socketError (D:\home\site\wwwroot\node_modules\tedious\lib\connection.js:575:56) at emitOne (events.js:77:13) at Socket.emit (events.js:169:7) at emitErrorNT (net.js:1256:8) at nextTickCallbackWith2Args (node.js:478:9) at process._tickDomainCallback (node.js:433:17) at process.<anonymous> (D:\home\site\wwwroot\node_modules\async-listener\index.js:19:15)

For me this is a persistent issue, it never worked even for a minute. I'm sure there is no problem with configs or firewalls. I'm using loopback.js with it's loopback-connector-mssql module.

lindo-jmm commented 8 years ago

Out of curiosity, do you have the encrypt option set to true?

Sent from Outlook Mobilehttps://aka.ms/qtex0l

On Tue, Jan 19, 2016 at 4:31 PM -0800, "Oleksandr Sochka" notifications@github.com<mailto:notifications@github.com> wrote:

This is the first time I'm trying to connect node.js to mssql on Azure App Service and haven't had success. I'm always getting ConnectionError: Connection lost - read ECONNRESET at ConnectionError (D:\home\site\wwwroot\node_modules\tedious\lib\errors.js:21:12) at Connection.socketError (D:\home\site\wwwroot\node_modules\tedious\lib\connection.js:575:56) at emitOne (events.js:77:13) at Socket.emit (events.js:169:7) at emitErrorNT (net.js:1256:8) at nextTickCallbackWith2Args (node.js:478:9) at process._tickDomainCallback (node.js:433:17) at process. (D:\home\site\wwwroot\node_modules\async-listener\index.js:19:15)

For me this is a persistent issue, it never worked even for a minute. I'm sure there is no problem with configs or firewalls. I'm using loopback.js with it's loopback-connector-mssql module.

Reply to this email directly or view it on GitHubhttps://github.com/pekim/tedious/issues/300#issuecomment-173033421.

sashasochka commented 8 years ago

Yeah, after writing my comment I went through the chain of libs and found out the comment in node-mssql which says you need to set enrypt to true. Which is not possible to do with loopback.js and therfore was not a part of their documentation. I will try a customly patched version now, will report here if that helps.

jtmilne commented 8 years ago

I get this error a couple times a day on all my node instances. I wrote a query wrapper to automatically retry on transient errors and it seems to work fine. Although it still is unsettling to see this happening.

lindo-jmm commented 8 years ago

Would you mind sharing a bit of pseudocode? I reached out on Twitter and apparently using a proxy was causing latency in all cases so MS removed this layer, but now if you hit your instance while its shifting you'll get hit with the error we're discussing (a transient error). All I've been able to gain is that "they're working to improve it". I believe the current accepted advice is to find some way to make your users wait 30s before retrying again =\

Please correct me if I'm misunderstanding anything!

Sent from Outlook Mobilehttps://aka.ms/qtex0l

On Tue, Jan 19, 2016 at 5:04 PM -0800, "Joel" notifications@github.com<mailto:notifications@github.com> wrote:

I get this error a couple times a day on all my node instances. I wrote a query wrapper to automatically retry on transient errors and it seems to work fine. Although it still is unsettling to see this happening.

Reply to this email directly or view it on GitHubhttps://github.com/pekim/tedious/issues/300#issuecomment-173040566.

jtmilne commented 8 years ago

Sure. Here's my singleton to handle db requests. I'm sure it could be greatly improved as Node/JS is not my primary language. http://pastebin.com/bxJHahfm

Reubend commented 8 years ago

Sorry for the delay. Tonight I tried adding some code to deal with transient errors, with good results so far. When the connection closed errors happen, I just keep retrying the request over and over. My test code for this is very primitive; Tedious doesn't seem to have a way to see the error number of a response, so I just assume all errors are transient. I'll post back here if my issues pop back up again.

If it does turn out to be transient errors, I think we should add handling for them to Tedious. I could try writing the code and then send in a pull request.

Reubend commented 8 years ago

Today I saw the errors happen again despite the retry logic I put in last time. My app encountered the error and then retried the request once per second for about 10 minutes without success.

robby commented 8 years ago

I've also been seeing this error with azure. With the debug on, I see this.

debug Connection lost - read ECONNRESET

Which then ends up throwing an uncaught exception killing the process/worker.

gkorland commented 8 years ago

We're getting the following error tens of times a day (we're using tedious-connection-pool)

database connection error: {\"message\":\"Connection lost - read ECONNRESET\",\"code\":\"ESOCKET\"}

seanlindo commented 8 years ago

@gkorland I'm using the same with an Azure SQL Instance, and we get buried in these every day. I've never seen anything like it.

francolaiuppa commented 8 years ago

I'm also getting the ECONNRESET error, output below:

ConnectionError: Connection lost - read ECONNRESET
    at ConnectionError (/usr/src/myapp/node_modules/tedious/lib/errors.js:12:12)
    at Connection.socketError (/usr/src/myapp/node_modules/tedious/lib/connection.js:535:28)
    at emitOne (events.js:82:20)
    at Socket.emit (events.js:169:7)
    at emitErrorNT (net.js:1256:8)
    at nextTickCallbackWith2Args (node.js:455:9)
    at process._tickCallback (node.js:369:17)
error: Forever detected script exited with code: 0
error: Script restart attempt #175

As you can see, the script restarted 175 times in less than 48 hours... (approximately once every 16 minutes).

We also have this problem in production (which doesn't use Azure but an internal MSSQL farm) I'll keep researching to see if I can fix it, so far I thought in forcing a reconnect every 5 minutes but I don't think that will solve it... maybe running a dummy query on timer just to keep the connection alive? Any ideas? Thanks in advance!

francolaiuppa commented 8 years ago

Guys I don't know if this is the right fix/hack or not, but I've been reading around and it seems the Azure SQL Databases are behind a load balancer (which does not surprise me at all). Some load balancers reset the connection rather than closing it (hence why we get the ECONNRESET). Since we had no error handler defined, it threw an Exception which in turn made the whole app crash!

To fix it, I've added

 connection = new Connection(tediousConfig);
 connection.on('error',function(err) {
    // of course here you need to check if it wasn't other kind 
    // of error, I've redacted it for brevity
    console.log(err);
    console.log('The app is still running as this error is recoverable');
 });

I've tested it by running a method that access the database right after the ECONNRESET happened and I was able to see the data without any problems! If anyone here can confirm this fixes their problem, I think we'll be closer to a final solution...

Hope it helps you!

Reubend commented 8 years ago

Franco, wouldn't that have been solved by the connection idle timeout? For instance, if each connection lasts for only 30 seconds, then it seems that the load balancer should not reset a connection that quickly.

seanlindo commented 8 years ago

@Reubend Did you ever share your code you used when positing your comment a few messages above?

Reubend commented 8 years ago

The code I used was very simple, it just used the built in error callback of a request to close the connection and then open it again before retrying the request.

francolaiuppa commented 8 years ago

@Reubend could you paste some code so I understand what are you referring to when you say the connection idle timeout? I'm using Tedious with Sequelize and I'm managing the connection idle timeouts using a pool from within sequelize (let me know if you want the full example).

Reubend commented 8 years ago

@francolaiuppa Sure! I'm using the node-mssql with tedious, so the code will look a little bit different from yours, but the idea is the same.

var databaseConfig = {
    user: 'redacted',
    password: 'redacted',
    server: 'redacted',
    database: 'redacted',
    connectionTimeout: 10000,
    requestTimeout: 10000,

    pool: {
        max: 10,
        min: 0,
        idleTimeoutMillis: 14999
    },

    options: {
        encrypt: true
    }
}

var database = new sql.Connection(databaseConfig, function (err) {
    if (err) {
        console.log("An error occurred while trying to connect to the database:");
        console.log(err);
        process.exit();
    } else {
        console.log("Connected to the database.");
        initializeServer();
    }
});

gkorland commented 8 years ago

@francolaiuppa you're right about your analysis (I confirmed it with the Azure team) The only real solution to make sure their LB won't shutdown the connection after 30min is to "ping" the connection. We're doing it by firing a lightweight query on idle connections every 5min (select 1)

jtmilne commented 8 years ago

@gkorland Shouldn't the idleTimeoutMillis setting on the pool prevent resetting the connection before the LB times out? I wouldn't think that a ping query is necessary as long as you set the idleTimeoutMillis to an appropriate value.

francolaiuppa commented 8 years ago

@Reubend thanks for sharing the code. I've the same config for pool but I was still getting the ECONNRESET. Regarding the connection retry, I've a very similar code, but that's a different issue.

@gkorland Glad to hear you could confirm it with the Azure team, thanks for taking time to do so! I also thought of making the SELECT 1 query on a timer but it seemed hacky... guess we'll have to do that to avoid the ECONNRESET altogether until another solution pops up :) Just wanted to add that even without running the select query on timer, I was still able to use the app without problems by just catching the ECONNRESET error and outputting a console log (as shown on my previous https://github.com/pekim/tedious/issues/300#issuecomment-216603746)

@jtmilne hey there, I've tried changing the values for the timeout before but I'll still get the ECONNRESET sooner or later. In my case is after some 16 minutes. Have you measured how long it takes for you?

seanlindo commented 8 years ago

@francolaiuppa It looks like there's two different solutions in play, can you confirm?

Catch the ECONNRESET error for those of us who are not; Everything with the app is actually fine.
Ping the database at some internal to prevent it from being reset by the LB

francolaiuppa commented 8 years ago

@seanlindo I can confirm the ECONNRESET solution. Regarding the ping database solution (i.e running a SELECT 1 query), it should work too, although I haven't tested it. I can try this approach next week if I get some time at work, but in the meantime, maybe you can run that test and let us know?

If we can confirm that the ping database solution works, I can create a PR to enable this feature ONLY for Azure-like databases, although I'm open to cleaner solutions

Thanks!

seanlindo commented 8 years ago

@francolaiuppa Like you, I'm also using node-mssql

Is there a better way to "recover" the connection when an error occurs? Is that a cleaner way of dealing with this issue as opposed to trying to prevent it by pinging the LB?

I followed it's documentation by attaching a listener and added this bit of code to my factory function (I should be checking the error in a better way, bear with me!):

sql.on('error', function (err) {
    if(err.message == "Connection lost - read ECONNRESET") {
      Logger.logInfo('ECONNRESET-- not crashing the application');
    }
    else if (err.message == "Connection lost - read ETIMEDOUT") {
      Logger.logInfo('ETIMEDOUT-- not crashing the application');
    }
});

Instead of simply logging the error, is there a way to "heal" the connection and return it? I should probably have this conversation in the node-mssql repo, but I've noticed Google searches bring up this thread.

Perhaps I'll inquire there as well.

tediousjs / tedious

Intermittent lost connection to Azure SQL #300