metabase / metabase

The simplest, fastest way to get business intelligence and analytics to everyone in your company :yum:
https://metabase.com
Other
38.48k stars 5.1k forks source link

ROWCOUNT limited ON SqlServer #9940

Closed KidYoungChang closed 2 years ago

KidYoungChang commented 5 years ago

System : Win 10,Java 1.8 Data On SQL Server 2008 The results is affected by ROWCOUNT limited when use "Native Query",if the Metabase version >0.31.0.

Test SQL script 
----------------------------------------
DECLARE @DATA AS TABLE(
    IDX INT IDENTITY(1,1),
    V INT
)
DECLARE @STEP INT 
SET @STEP = 1
WHILE @STEP <=3000
BEGIN
    INSERT INTO @DATA(V)
    SELECT 1
    SET @STEP = @STEP + 1
END 

DECLARE @TEMP AS TABLE(
    IDX INT IDENTITY(1,1),
    V INT
)
INSERT INTO @TEMP(V)
SELECT V FROM @DATA

SELECT COUNT(1) FROM @TEMP
---------------------------------------

The data has 3000 rows,maybe more,but the result is limited at 2000.

:arrow_down: Please click the :+1: reaction instead of leaving a +1 or update? comment

flamber commented 5 years ago

Just for reference, this was added to the driver in 0.31.1: https://github.com/metabase/metabase/blob/b913f5f70d8103d80255ccfcc2e09ae502c08220/src/metabase/driver/sqlserver.clj#L159-L169 Discussed in the forum last month: https://discourse.metabase.com/t/when-update-to-0-32-1-with-data-view-problem-on-sql-server/5811

mazameli commented 5 years ago

@camsaul It looks like this was changed in 0.31.1 in order to fix something, but seems to be causing unexpected results for users. Bug?

notrom commented 5 years ago

Metabase Version: v0.33.0-preview2 (also main branch v0.32.4) Metabase DB: Postgres Database: SQL Server 2014

I've struck this a few times recently. Really frustrating when you get a relatively complex query running "fast enough" in SQL using some indexed temp tables etc., then you move it into Metabase and it misses most of your data.

No real issues with limiting the rowcount for the end result set, although it would be good to be able to see/set this, but it would be great if we were free to do intermediate steps with no limit.

notrom commented 5 years ago

I just read the linked discourse thread. Based on that I found that if you add an explicit SET ROWCOUNT 1000000 statement at the top of the script, or before the large dataset is selected, then you can work around it.

This is totally sufficient for me now that I know, I just wasn't aware that it was doing an explicit SET ROWCOUNT 2000 on my session.

Edit: I think the work around of being explicit about the SET ROWCOUNT X in the query means that you can't then use that question as the base of further queries in Metabase using the "Saved Questions" section. Not a deal breaker but not ideal.

dfx413 commented 5 years ago

Hi @flamber, just want to clarify if you don't mind. I have looked into the code (I don't use Clojure so I might be wrong) and if I read it correctly the referenced part works like this: if there is ordering applied in the subquery it adds "TOP [absolute-max-results]" to the select statement so in my case it looks like SELECT TOP 1048576 [whatever] FROM [whatever]. AFAIK this does not have to do anything with SET ROWCOUNT because id does not affect the count of rows returned in subquery, just the count of rows returned from the top-level query. So I don't really see a problem here. Am I wrong?

flamber commented 5 years ago

@dfx413 I'm not a Clojure programmer.

dfx413 commented 5 years ago

Could someone else comment on this, please? I am really confused by this right now. Maybe @sbelak would know more? Thank you very much.

MarcGJA commented 5 years ago

Ran into what sounds like the same issue but with SQL Server TVPs that act rather like temp tables: Table Valued Parameters (TVP) SQL Server expressions restricted by Metabase row limit

Working around it with views & CTEs so far but it really threw me when time series data started being cut-off the further back in time the data reached!

Metabase: v0.33.2 Host: Localhost:3000 Databases: H2 internal default, Server SQL added Browser: Firefox 52.9.0 OS: Microsoft Windows [Version 6.0.6002] Java: 1.8.0_45

jeff303 commented 2 years ago

As of Metabase 39 (via #14883), the original query in this issue doesn't even work. It fails with Select statement did not produce a ResultSet for native query.

It seems that in order for this query to work at all, only a PreparedStatement can be used (not a Statement, which is what we started using for native queries without parameters under #14883). I tried updating the SQL Server JDBC driver version to the latest, but that still doesn't fix this.

If we want this sort of thing to work, I will have to mark statements as not being supported by the driver, then fix this row count issue.

In any event, this appears to be a known issue with the MS SQL Server JDBC driver. See microsoft/mssql-jdbc#176 and microsoft/mssql-jdbc#1107

In order to work around this, we will need to avoid calling .setMaxRows here, and always ensure we emit TOP in any query generated from MBQL, and just hope for the best with native queries? Or possibly only set this value with MBQL queries? We have to be careful that we don't open the door to allowing millions of rows to be returned, destroying the memory of the Metabase instance in the process.

@camsaul , any thoughts here?

flamber commented 2 years ago

@jeff303 Worth a read too: #16182 and #4050 (later comments) I remember constantly hitting this, so I build my own driver, which also added full support for older MSSQL.

camsaul commented 2 years ago

Returning millions of rows shouldn't destroy the Metabase server since we stream the rows directly to the REST API JSON response as they come in... but it would probably destroy your browser and put a lot of load on the database.

Based on how it seems like they're not going to fix https://github.com/microsoft/mssql-jdbc/issues/176 upstream I think we should stop doing .setMaxRows for SQL Server and instead just always wrap the whole query in a

SELECT TOP <max rows> * FROM ( ... )

before we execute it or something like that

flamber commented 2 years ago

Doing a subquery like that would break CTE native questions like this (also see #18685)

WITH test_cte AS (
  SELECT 1 as val
)
SELECT * FROM test_cte

Another note, SET ROWCOUNT 0 is currently being used by several to get around #12729

There's no perfect way out of this without breaking something for some people.

jeff303 commented 2 years ago

I don't think we can leave it totally unconstrained, and it also sounds like a subquery isn't a good idea.

What about simply turning these into settings? That way admins could adjust the value as needed (perhaps above the default), or maybe even set to 0 to leave unconstrained (along with a big warning flag)?

jeff303 commented 2 years ago

Solved in 0.43 via #19267. SQL Server will have a new connection property called ROWCOUNT Override. If left defaulted (i.e. not filled in), then the previous behavior applies (i.e. the 2000 limit). If this is changed to something (including 0, which means "unconstrained" in SQL Server's ROWCOUNT session variable), then that value will take effect in queries executed against that database.