tediousjs / tedious

Node TDS module for connecting to SQL Server databases.
http://tediousjs.github.io/tedious/
MIT License
1.57k stars 436 forks source link

Can't query MSSQL database #394

Closed peteoconnell closed 8 years ago

peteoconnell commented 8 years ago

I've got a very basic test and when I comment out the function to query the database I get a connection to the database but as soon as I try and query it I get the following error:

Connected { [RequestError: Requests can only be made in the LoggedIn state, not the Connecting state] message: 'Requests can only be made in the LoggedIn state, not the Connecting state', code: 'EINVALIDSTATE' }

var Connection = require('tedious').Connection;  
    var config = {  
        userName: 'sa',  
        password: 'sa',  
        server: 'localhost',  
        options: {encrypt: true, database: 'pete'}  
    };  
    var connection = new Connection(config);  
    connection.on('connect', function(err) {  
    // If no error, then good to proceed.  
        console.log("Connected");  
        executeStatement(); // if I comment out this line I get a 'connected' in the console.
    });  

    var Request = require('tedious').Request;  
    var TYPES = require('tedious').TYPES;  

    function executeStatement() {  
        request = new Request( "select 42, 'hello world'", function(err, rowCount) {
            if (err) {
                console.log(err);
            } else {
                console.log(rowCount + ' rows');
            }
            connection.close();
        });

        request.on('done',function(rowCount, more) {
            console.log(rowCount +' rows returned' );
        });

        connection.execSql(request);
    }  `

On the basis I get connected, I'd assume this should be working fine as I am doing the query after I'm connected. Any help would be appreciated!

PeterNierop commented 8 years ago

It seems you already closed the connection in the execute statement, hence the request.on('done' is not required or even invalid.

On side note, I think it is best to use stored procedures. Tedious is very fast, but you would lose much of that on the server side by forcing SQL server to compile again and again and again and again the same statements. Better put it in a procedure and tune it to the max and call that from Tedious.

bretcope commented 8 years ago

@PeterNierop if you send SQL Server the same query text a thousand times, it's not going to compile it a thousand times. It caches query plans and reuses them. It doesn't even have to parse the statement if it's in cache. That's one reason why using real parameters (as opposed to inlining them into the query) is important. You may still choose to use stored procedures if that's your preference or there's a business need, but there's no performance benefit of having a query in a SP vs just running the query.

PeterNierop commented 8 years ago

I fear we disagree.

SQL statement arrives as a string. This string needs to be normalized, bound and checked for syntax errors. It then is compiled. I am open for any evidence that nowadays SQL somehow caches strings and match those in some miraculous way with strings of before. However without such evidence I go with my 52 year old historical knowledge that this is not the case.

A second reason why sql statements are not good thing is SQL injection. This problem should not be underestimated. I have used code in a simple address field that successful brought down a site.

The third reason is network delay. A long string with multiple joins can easily run several thousand characters long. A short stored proc name is always a winner.

You do mention query plans but I do not think we have the same understanding on what that exactly is. In a common select query there are several where clauses. The query plan figures out which where clause to handle first based on the data distribution. As example Say you need all males in London with 3 eyes and given you have a database with all body attributes of all living people. There are 3 or so genders.. so 7 billion / 3 will create a fair bit of tempdb so SQL will not use that as first. There are hundreds of thousand cities so SQL will estimate that first iteration should be on city. We however know that 3 eyes is rare and there would be very few of those in London. And this is the query plan.

bretcope commented 8 years ago

@PeterNierop well, in Microsoft's own words, since at least SQL Server 2008:

When any SQL statement is executed in SQL Server, the relational engine first looks through the procedure cache to verify that an existing execution plan for the same SQL statement exists. SQL Server reuses any existing plan it finds, saving the overhead of recompiling the SQL statement. If no existing execution plan exists, SQL Server generates a new execution plan for the query.

It doesn't explicitly specify the way they do the matching, other than it's "an efficient algorithm", but given the fact that changing a piece of text usually causes it to recompile, I wouldn't be surprised if it starts with something as simple as a hashtable lookup. Could be more complicated; I'd have to ask someone who's more intimately familiar.

If SQL Server had to recompile every statement, Stack Overflow would be screwed. We don't use stored procedures, and we get fantastic performance.

Inlining parameters obviously has security risks... it also means SQL Server probably isn't going to reuse query plans. So it's bad on all fronts. Using real parameters solves both problems.

For us, there's no extra network cost to using statements because almost all of our queries (including parameters) fit in a single packet. Granted, we also use jumbo frames, which helps, but most of our statements are small anyway.

I think I agree with your description of a query/execution plan. It's the "program" that a SQL query compiles to, and the exact program which gets generated can vary depending on the indexes, table statistics, parameters, and compatibility level at compile time. We occasionally have to clear a bad query plan when it was compiled with parameters that are not representative of typical usage. Stored procedures aren't immune to these problems either though.

There are valid reasons to use stored procedures over regular statements. Tight security requirements is a good one. Having really big statements with control logic is another. Simply preferring them for no reason is also a perfectly valid. There are downsides to using stored procedures as well. For us, the biggest downside is that it moves the SQL away from the code that executes it, making SQL more of a black box and harder to debug. So it's a choice, there are trade offs, and what works for us might not work for you.

I don't care which way you, or anyone else, chooses to go. My only point was that SQL doesn't recompile statements every time it sees the same one (hopefully we can agree on that), and, because of that, the performance difference between a statement and a SP is negligible in most cases.

PeterNierop commented 8 years ago

Bret,  interesting quote. I will research that. If true I stand corrected.  You speak of fantastic performance. Any indication of what sort of speed you are talking about? I wrote an interface on node using tedious that verifies a detected car number plate. It does 5882 calls a second using stored procedures. The previous implementation that I also wrote using IIS ASP and the latest dotNet rubbish did 62 calls a second. But that used the same stored procedure. So it is entirely possible your fantastic performance is credit to Tedious and Node rather than uncompiled statements vs compiled statements. Still it be intriguing to know for sure. That said, i can not yet see how MS can do that trick without tokenizing and processing the string, removing extra spaces etc. A simple stored procedure might just include some select or insert statement. I happen to often create update, insert, select statements that first check existence and if so do an update, if not do an insert and end with selecting or outputing some result. Such statements can be several thousands of characters long. Again I can not see how MS does recognize it to be identical to a previous string with different parameter values and possible different logic based on existence of data. A normal database can quickly contain several hundred of such long and complex strings. Comparing this somehow raw to each and every string and still being faster than just compiling would indicate a trick I really want to know if the quote you took is indeed correct.

bretcope commented 8 years ago

@PeterNierop for the record, we don't use node.js or Tedious at Stack Overflow. We did have a project which used it for a little while, but we've since converted it to C# .NET like the rest of our code.

Allow me to demonstrate what I was talking about with an example. You can force SQL Server to recompile the statement every time with OPTION(RECOMPILE), so I wrote a Linqpad script to test stored procedures against regular statements with and without RECOMPILE.

Here's the SQL I ran in SSMS prior to running the script:

create procedure SimpleProc
    @value int
as
    select @value as Value;
GO

create table TestData
(
    Id int not null Identity(1,1),
    Boolean bit not null,
    Number int not null,
    FloatingPoint float not null,

    constraint PK_TestData primary key clustered (Id),
    index IX_TestData_Number (Number),
)
GO

-- insert 100000 rows of random sample data
insert into TestData (Boolean, Number, FloatingPoint) values
(
  cast(rand() * 2 as bit),
  cast(rand() * 1000000 as int),
  rand()
)
GO 100000

create procedure GetTestData
    @min int,
    @max int
as
    select * from TestData where Number between @min and @max order by Number;
GO

The script calls each SP 100000 times, and then compares that vs calling the exact SQL of the SP body vs the same SQL, but with OPTION(RECOMPILE). So:

SimpleProc()
vs
select @value as Value;
vs
select @value as Value OPTION(RECOMPILE);

and

GetTestData()
vs
select * from TestData where Number between @min and @max order by Number;
vs
select * from TestData where Number between @min and @max order by Number OPTION(RECOMPILE);

Here are the results on my machine:

## SimpleProc ##
00:00:03.1216239
100000 rows

## select @value ##
00:00:03.3207671
100000 rows

## select @value OPTION(RECOMPILE) ##
00:00:20.5484253
100000 rows

## GetTestData ##
00:00:09.7927993
2009851 rows

## select * from TestData... ##
00:00:10.0120219
2008934 rows

## select * from TestData... OPTION(RECOMPILE) ##
00:01:33.7348062
2009654 rows

As you can see, the SP, does slightly edge out a regular statement by about 200 milliseconds over 100000 iterations (0.002 milliseconds difference per request - hence my assertion that the difference is negligible). That difference most likely represents the overhead of the plan matching algorithm.

On the other hand, when RECOMPILE is used, it takes an order of magnitude longer, and it would get substantially worse with more complex queries. That is why I said Stack Overflow would be screwed if SQL Server decided to start recompiling queries every time. We would see a huge performance degradation, and would likely take us offline.

Again, I'm not trying to assert that you should pick one style over the other. Just pointing out that they have very comparable performance, and that your choice should probably be based on other factors.

For reference, here's the script itself:

void Main()
{
    using (var conn = new SqlConnection("Persist Security Info=False;Integrated Security=true;Initial Catalog=Test;server=localhost"))
    {
        conn.Open();

        int rowsRead;

        var rng = new Random();
        const int iterations = 100000;
        Stopwatch sw;

        GC.Collect();

        rowsRead = 0;
        sw = new Stopwatch();
        sw.Start();
        for (var i = 0; i < iterations; i++)
        {
            using (var cmd = new SqlCommand("SimpleProc", conn))
            {
                cmd.CommandType = CommandType.StoredProcedure;
                cmd.Parameters.Add("@value", SqlDbType.Int).Value = rng.Next(100000);

                using (var rdr = cmd.ExecuteReader())
                {
                    while (rdr.Read())
                    {
                        rowsRead++;
                    }
                }
            }
        }
        sw.Stop();
        "## SimpleProc ##".Dump();
        sw.Elapsed.Dump();
        $"{rowsRead} rows".Dump();
        "".Dump();

        GC.Collect();

        rowsRead = 0;
        sw = new Stopwatch();
        sw.Start();
        for (var i = 0; i < iterations; i++)
        {
            using (var cmd = new SqlCommand("select @value as Value;", conn))
            {
                cmd.Parameters.Add("@value", SqlDbType.Int).Value = rng.Next(100000);

                using (var rdr = cmd.ExecuteReader())
                {
                    while (rdr.Read())
                    {
                        rowsRead++;
                    }
                }
            }
        }
        sw.Stop();
        "## select @value ##".Dump();
        sw.Elapsed.Dump();
        $"{rowsRead} rows".Dump();
        "".Dump();

        GC.Collect();

        rowsRead = 0;
        sw = new Stopwatch();
        sw.Start();
        for (var i = 0; i < iterations; i++)
        {
            using (var cmd = new SqlCommand("select @value as Value OPTION(RECOMPILE);", conn))
            {
                cmd.Parameters.Add("@value", SqlDbType.Int).Value = rng.Next(100000);

                using (var rdr = cmd.ExecuteReader())
                {
                    while (rdr.Read())
                    {
                        rowsRead++;
                    }
                }
            }
        }
        sw.Stop();
        "## select @value OPTION(RECOMPILE) ##".Dump();
        sw.Elapsed.Dump();
        $"{rowsRead} rows".Dump();
        "".Dump();

        GC.Collect();

        rowsRead = 0;
        sw = new Stopwatch();
        sw.Start();
        for (var i = 0; i < iterations; i++)
        {
            using (var cmd = new SqlCommand("GetTestData", conn))
            {
                cmd.CommandType = CommandType.StoredProcedure;

                var min = rng.Next(999800);
                var max = min + 200;
                cmd.Parameters.Add("@min", SqlDbType.Int).Value = min;
                cmd.Parameters.Add("@max", SqlDbType.Int).Value = max;

                using (var rdr = cmd.ExecuteReader())
                {
                    while (rdr.Read())
                    {
                        rowsRead++;
                    }
                }
            }
        }
        sw.Stop();
        "## GetTestData ##".Dump();
        sw.Elapsed.Dump();
        $"{rowsRead} rows".Dump();
        "".Dump();

        GC.Collect();

        rowsRead = 0;
        sw = new Stopwatch();
        sw.Start();
        for (var i = 0; i < iterations; i++)
        {
            using (var cmd = new SqlCommand("select * from TestData where Number between @min and @max order by Number;", conn))
            {
                var min = rng.Next(999800);
                var max = min + 200;
                cmd.Parameters.Add("@min", SqlDbType.Int).Value = min;
                cmd.Parameters.Add("@max", SqlDbType.Int).Value = max;

                using (var rdr = cmd.ExecuteReader())
                {
                    while (rdr.Read())
                    {
                        rowsRead++;
                    }
                }
            }
        }
        sw.Stop();
        "## select * from TestData... ##".Dump();
        sw.Elapsed.Dump();
        $"{rowsRead} rows".Dump();
        "".Dump();

        GC.Collect();

        rowsRead = 0;
        sw = new Stopwatch();
        sw.Start();
        for (var i = 0; i < iterations; i++)
        {
            using (var cmd = new SqlCommand("select * from TestData where Number between @min and @max order by Number OPTION(RECOMPILE);", conn))
            {
                var min = rng.Next(999800);
                var max = min + 200;
                cmd.Parameters.Add("@min", SqlDbType.Int).Value = min;
                cmd.Parameters.Add("@max", SqlDbType.Int).Value = max;

                using (var rdr = cmd.ExecuteReader())
                {
                    while (rdr.Read())
                    {
                        rowsRead++;
                    }
                }
            }
        }
        sw.Stop();
        "## select * from TestData... OPTION(RECOMPILE) ##".Dump();
        sw.Elapsed.Dump();
        $"{rowsRead} rows".Dump();
        "".Dump();
    }
}