tediousjs / tedious

Node TDS module for connecting to SQL Server databases.
http://tediousjs.github.io/tedious/
MIT License
1.58k stars 439 forks source link

FailureException: recvmsg EINVALStack: ConnectionError: recvmsg EINVAL - EINSTLOOKUP fails in Azure ASE v3 environment #1647

Closed teresahoes-klf closed 3 months ago

teresahoes-klf commented 3 months ago

--cross posted to mssql - they suggested i post here as its tedious specific https://github.com/tediousjs/node-mssql/issues/1688

We recently had to upgrade our Azure ASE environment from v2 to v3. Since then our function apps and web apps that connect to SQL instances have been failing on the lookup to get the instance port. These calls all worked prior to the upgrade and work outside the ASE environment (i.e. running locally on a dev machine). We have been working with MS support but they have not been able to identify any network related issues. The communication between the azure function/webapp and the UDP port on the sql server happens (we have wire sharked), but for some reason the tedious driver doesn't understand what is returned or fails for some other reason. Since we can't debug in Azure we can't find any further detail on the error besides the stack trace noted below.

Note we did try using the sqlnodev8 driver and those calls DO work successfully so this is limited to the tedious driver.

I don't know if there is an updated needed to tedious driver or you have other suggestions on how to get this to work. Switching everything to the sqlnodev8 driver isn't a very good option for us.

Expected behaviour: Instance lookup should work as expected.

Actual behaviour: 2024-08-06T22:27:17.924 [Error] Executed 'Functions.timer-xxx' (Failed, Id=95e49fde-6701-4227-8a19-fd2f40926d25, Duration=12ms) Result: FailureException: recvmsg EINVAL Stack: ConnectionError: recvmsg EINVAL at C:\home\site\wwwroot\node_modules\mssql\lib\tedious\connection-pool.js:85:17 at Connection.onConnect (C:\home\site\wwwroot\node_modules\tedious\lib\connection.js:849:9) at Object.onceWrapper (node:events:629:26)at Connection.emit (node:events:514:28) at Connection.emit (C:\home\site\wwwroot\node_modules\tedious\lib\connection.js:970:18) at C:\home\site\wwwroot\node_modules\tedious\lib\connection.js:1006:16 at process.processTicksAndRejections (node:internal/process/task_queues:77:11) Configuration: Here is a very simple function that we can use to reproduce issue

const SerializeError = require('serialize-error');

const sql = require('mssql');

const sqlConfig = { "user": "xxx", "password": "xxx", "server": "xxxx.company.com", "database": "xxxx", "connectionTimeout": 240000, "requestTimeout": 600000, "options": { "encrypt": true, "abortTransactionOnError": true, "trustServerCertificate": true, "instanceName": "QA" } }

module.exports = async function (context, timer) {

try {
    context.log(`CONFIG: ${JSON.stringify(sqlConfig)}`);

    await sql.connect(sqlConfig)
    const result = await sql.query`select count(*) from mh_photos_s`

    context.log(`Form Count: ${JSON.stringify(result)}`)
} catch (reason) {
    const serializedError = SerializeError.serializeError(reason);
    context.log(`ERROR: ${JSON.stringify(serializedError)}`);
    throw reason;
}

}; Software versions NodeJS: 16x, 18x, 20 x node-mssql: 10.0.2 , 11.01 SQL Server: 2016

`

MichaelSun90 commented 3 months ago

Hi @teresahoes-klf , thanks for the details on the issue. Have you tried enable the debug options like @dhensby mentioned under the mssql side issue? The debug options can provide more details on package contents and maybe more information on the error side.

teresahoes-klf commented 3 months ago

Hi @teresahoes-klf , thanks for the details on the issue. Have you tried enable the debug options like @dhensby mentioned under the mssql side issue? The debug options can provide more details on package contents and maybe more information on the error side.

Hi @MichaelSun90 -- I did try putting in that environment variable to the web app, as well as setting it in the config "options": { "encrypt": true, "abortTransactionOnError": true, "trustServerCertificate": true, "instanceName": "QA", "debug" : { "data": true, "payload": true, "token": true, "packet": true, "log": true } }

but i dont get anything more in the stack trace for the error. where else might the extra detail show up?

MichaelSun90 commented 3 months ago

Not 100% sure how exactly mssql side works with the debug options, but I think it should be similar process. There was a recent change on how debug work, and we will be updating the doc soon. You can check for details here . However, in short you should be able to enable the debug by doing this:

import { debugOptionsFromEnv } from '../test/helpers/debug-options-from-env';

config.options.debug = debugOptionsFromEnv();
if (process.env.TEDIOUS_DEBUG) {
      connection.on('debug', console.log);
 }

there are some examples under the link that I posted for the test files.

arthurschreiber commented 3 months ago

You should be able to load the tedious/lib/instance-lookup module directly and see what is being sent from tedious to the server browser port, and what's replied by the server.

There's also two different ways of asking for sql server instances, one is connecting directly to the ip address running the sql server browser component (this is what's implemented by tedious), and the other one is using a multicast IP address.

There's also different messages being sent to the sql server browser component depending on how the message is being sent. The exact message details are described in MC-SQLR.

It's really hard to guess what's going wrong. Because the SQLR protocol uses UDP, there's no way to understand whether sending a message worked, or whether there's some issue with the connection itself (Firewall rules etc).

As you're mentioning that connections with sqlnodev8 seem to work, they must be doing something differently than we are in tedious.

teresahoes-klf commented 3 months ago

so i did get debugging turned on and here is the output right before the failure 2024-08-07T20:05:12.318Z mssql:base pool(4): created 2024-08-07T20:05:12.358Z mssql:base pool(4): connecting 2024-08-07T20:05:12.375Z mssql:tedi pool(4): connection #1 created 2024-08-07T20:05:12.389Z mssql:tedi connection(1): establishing 2024-08-07T20:05:12.561Z agentkeepalive sock[0#graph.microsoft.com:443:::::::::::::::::::::] create, timeout 60000ms 2024-08-07T20:05:12.945Z agentkeepalive sock[0#graph.microsoft.com:443:::::::::::::::::::::](requests: 1, finished: 1) free ========= ERROR - START ========= "{ code: 'EINSTLOOKUP', originalError: { code: 'EINSTLOOKUP', name: 'Error', message: 'recvmsg EINVAL', stack: 'Error: recvmsg EINVAL\n' + ' at C:\\home\\site\\wwwroot\\node_modules\\mssql\\node_modules\\tedious\\lib\\connection.js:995:32\n' + ' at processTicksAndRejections (node:internal/process/task_queues:78:11)'

======================= which makes me wonder if it has something to do with self signed certs or something in the ssl chain.

dhensby commented 3 months ago

I don't think that's anything to do with the SSL chain, it looks like a very low-level error, there's only one place in the tedious code that throws EINSTLOOKUP error code, and that's when trying to look up the instance (which we had already figured out seemed to be the culprit). Unfortunately tedious is picking up the underlying error and then just taking the message and wrapping it in a new error so the very root of where the error has come from isn't entirely clear, but I'd suspect it's from here.

But that doesn't really help us get any closer :/

teresahoes-klf commented 3 months ago

Just wanted to post an update in case anyone else runs across this problem. A colleague of mine was able to isolate this issue to outside the tedious driver - it seems to be specific to how the underlying environment fails to assign a dynamic port -- see below:

The following code helps isolate the issue. The issue seems to be the call to server.bind passing in a value of 0 for port. According to the NODE docs: “If port is not specified or is 0, the operating system will attempt to bind to a random port”. However, on the Azure Function App it is not being assigned a random port number and is listening for responses on 0.0.0.0:0.

On our Node Dev box the below code successfully works and displays:

 Listening to: 0.0.0.0:51388

 Socket bound to: 172.18.15.13:1434

 UDP message sent to 172.18.15.13:1434

 Message received:     ServerName;xxx;InstanceName;xx1;IsClustered;No;Version;13.0.6300.2;tcp;49776;np;\\xx1\pipe\MSSQL$DTK1\sql\query;;ServerName;xx;InstanceName;DTK2;IsClustered;No;Version;13.0.6300.2;tcp;49766;np;\\xx\pipe\MSSQL$DTK2\sql\query;;ServerName;xx;InstanceName;xx;IsClustered;No;Version;13.0.6300.2;tcp;49768;np;\\xx\pipe\MSSQL$DTK3\sql\query;;ServerName;xx;InstanceName;DTK4;IsClustered;No;Version;13.0.6300.2;tcp;49775;np;\\xx\pipe\MSSQL$DTK4\sql\query;; from 172.18.15.13:1434

On our Azure Function App environment it fails and displays:

 Listening to: 0.0.0.0:0

 Socket bound to: 172.18.15.13:1434

 UDP message sent to 172.18.15.13:1434

 Server error:

 Error: recvmsg EINVAL at UDP.onMessage [as onmessage] (node:dgram:939:31)

HERE IS THE CODE: const dgram = require('dgram');

const util = require('util');

const dns = require('node:dns');

const dnsLookup = util.promisify(dns.lookup);

const sqlServerHostname = 'xxxxxx.xxxxxx.com';

(async () => {

try {

    const localClientPort = 0;

    const sqlServer = await dnsLookup(sqlServerHostname, {family: 4});

    const sqlServerPort = 1434;

    const server = dgram.createSocket('udp4');

    server.on('error', (err) => {

        console.log(`Server error:\n${err.stack}`);

        server.close();

    });

    server.on('message', (msg, rinfo) => {

        console.log(`Message received: ${msg} from ${rinfo.address}:${rinfo.port}`);

    });

    server.on('listening', () => {

        const address = server.address();

        console.log(`Listening to: ${address.address}:${address.port}`);

    });

    server.on('connect', () => {

        const address = server.address();

        console.log(`Connected to: ${address.address}:${address.port}`);

    });

    // server.bind({port: localClientPort, exclusive: true});

    server.bind(localClientPort, function(err, bytes) {

        if (err) {

            console.log(`Bind error:\n${err.stack}`);

            throw err;

        }

        console.log('Socket bound to: ' + sqlServer.address +':'+ sqlServerPort);

    });

    const message = Buffer.from([0x02]);

    server.send(message, 0, message.length, sqlServerPort, sqlServer.address, function(err, bytes) {

        if (err) throw err;

        console.log('UDP message sent to ' + sqlServer.address +':'+ sqlServerPort);

    });

} catch (error) {

    console.log(`ERROR: ${error}`);

}

})()

Therefore, the error isn't specific to the tedious driver, but seems to be an underlying node/Azure environment issue since the above code uses node calls only.

MichaelSun90 commented 3 months ago

@teresahoes-klf Thanks for sharing this. 👍 🙇