microsoft / DacFx

SQL Server database schema validation, deployment, and upgrade runtime. Enables declarative database development and database portability across SQL Server versions and environments.
https://aka.ms/sqlpackage-ref
MIT License
296 stars 16 forks source link

Synapse Serverless Pool View defined using SELECT * on OPENROWSET query with columns defined causes SQL71501 Error #365

Open jasonhorner opened 8 months ago

jasonhorner commented 8 months ago

Steps to Reproduce:

  1. Consider this valid TSQL create for a serverless pool database CREATE VIEW dbo.foo AS SELECT src.* FROM OPENROWSET( BULK 'raw/scm6/dbo/foo/*.parquet', DATA_SOURCE = 'dls', FORMAT='PARQUET' ) with (databaseName varchar(128), ID int) as src

  2. Create a View on top of this CREATE VIEW dbo.bar AS SELECT databaseName, ID FROM dbo.foo; this will throw: Build error SQL71501: on both columns

  3. adjusting the view to explicitly call the columns seems to work fine: CREATE VIEW dbo.foo AS SELECT src.databaseName ,src.ID FROM OPENROWSET( BULK 'raw/scm6/dbo/foo/*.parquet', DATA_SOURCE = 'dls', FORMAT='PARQUET' ) with (databaseName varchar(128), ID int) as src

I get that select * is potentially an anti-pattern but the sql engine / parser seems to be able to sort this out and the code works fine on my serverless pool. Also notable is the fact that the columns are defined in the with statement I believe I have seen intellisense even suggest the column names when typing the fixed version of the view.

Extracting the code into a database project in either ADS or VS2022 fails to build due to SQL71501. I have thousands of such errors due to the same issue. I'm not entirely sure this is a DacFx Error but hoping to get some clarification.

Happy to provide a database project file on github with a sample if that helps speed triage / fix.   Did this occur in prior versions? If not - which version(s) did it work in? unsure

(DacFx/SqlPackage/SSMS/Azure Data Studio)

dzsquared commented 8 months ago

Thanks for flagging this, we'll take a look at how it could be handled. The expansion of columns out of a parquet file is something the SQL engine is uniquely able to handle because it can access the file, but DacFx build does not.

Would you prefer that this be a build warning along the lines of "SQL project build includes unresolved columns from an external file" such that it doesn't block your build but gives you a heads up that the columns aren't validated?

jasonhorner commented 8 months ago

Got it thanks for the clarification and fast response. Yes that would be super helpful... The main issue is that I thought perhaps something was broken in SSDT support for serverless pools as the error's seemed spurious (I've seen similar errors when there was a column casing difference for example). it wasn't until I took a few steps back I was able to work out what was going on. Mainly I just needed to get the build to succeed as I'm kind of blocked with the build errors.. when I excluded the files from the build then schema compare would detect any changes so it was a bit of a catch 22..

ideally a longer term solution would be to try and parse the "with" clause to validate the column metadata. but that doesn't help when it isn't defined. as it is an optional component for parquet files.

asos-martinsmith commented 6 months ago

I don't see use of * as an anti pattern here, It is a valid DRY use case.

It is not referencing some external object that can have its schema change as * here just means "project all the columns defined in the WITH" - and the only way that list can change is by doing an ALTER on the VIEW which will refresh the view metadata anyway.

Thanks for flagging this, we'll take a look at how it could be handled. The expansion of columns out of a parquet file is something the SQL engine is uniquely able to handle because it can access the file, but DacFx build does not.

Would you prefer that this be a build warning along the lines of "SQL project build includes unresolved columns from an external file" such that it doesn't block your build but gives you a heads up that the columns aren't validated?

I understand this argument when there is no WITH clause supplied but when the OPENROWSET has a WITH clause I would expect it to be able to see the column definitions defined there.

The same logic is also needed more generally for OPENJSON as in the case here

SELECT *
FROM OPENJSON('{"a": 1}')
WITH (a INT)

I would expect it to be able to see that * maps to a single column a and so if used in a table expression it will project a column called a.

And relatedly that the following would be an error as no such column defined

SELECT b
FROM OPENJSON('{"a": 1}')
WITH (a INT)

Maybe this has in fact already been implemented for OPENJSON (I haven't checked) in which case it could be reused here?

jasonhorner commented 6 months ago

@asos-martinsmith makes some really good points here.