Closed csubhodeep closed 3 months ago
@csubhodeep Have you tried using the Ibis mssql
backend for this? Maybe it just worksβ’?
@djouallah Is this something that might interest you?
@cpcloud yes I have tried the mssql
backend and saw the error messages that i have mentioned in the discussion. It "kind of works".. but requires a bit of hacking.
JFYI.. I think for most of the methods like list_tables
, tables
, list_databases
, list_schemas
, create_table
could be supported quickly because, i think, these are just suffering from a mere case-sensitivity issue, from the error message that I had shared in the thread we get the following
ProgrammingError: ('42S02', "[42S02] [Microsoft][ODBC Driver 18 for SQL Server][SQL Server]Invalid object name '<NAME OF WAREHOUSE>.information_schema.schemata'. (208) (SQLExecDirectW)")
In our data warehouse if I just execute the following
select *
from INFORMATION_SCHEMA.COLUMNS;
I get a nice table without any error. However, if I do the following (as in the error message)
select *
from information_schema.columns;
I get the exact same error on Fabric
@cpcloud @gforsyth Do we know by when could we solve this one?
Also none of the hack works if I try to create a table from an in-memory table made using using either ibis.memtable
or pandas.DataFrame
constructors.
It is now blocking us, hence, it would be nice to know if anyone could suggest any workarounds till this is officially supported.
UPDATE
As per the their official docs
Currently, Latin1_General_100_BIN2_UTF8 is the default and only supported collation for both tables and metadata.
That means Fabric is currently case-sensitive presumably till the end of this quarter (if their development goes according to their plan). As per their roadmap they would support more collations later.
Now, from the perspective of ibis
, at least from the error messages, if I understood them correctly, accessing the information_schema
is crucial for some of the methods to work correctly (such as create_table
, table
, list_tables
etc.)
Therefore, to circumvent the above, I wrote a small Python snippet that creates - a) a new schema with the name that ibis
expects i.e. lower-cased information_schema
, and b) views that the original schema i.e. (upper cased) INFORMATION_SCHEMA
has but all of their names and the column names in them in lower case.
query_to_get_all_org_views_and_columns = """
SELECT
o.name AS view_name,
c.name AS column_name
FROM sys.all_columns c
INNER JOIN (
SELECT *
FROM sys.all_objects o
WHERE schema_id = 3 -- schema_id = 3 is the schema_id for the information_schema
) o
ON c.object_id = o.object_id
"""
df_is = pd.read_sql_query(
query_to_get_all_org_views_and_columns,
connection # this is the same `pyodbc` connection created previously using custom authentication
)
cursor = connection.cursor()
for view_name, gdf in df_is.groupby("view_name"):
select_columns = ", ".join([f"{col} AS {col.lower()}" for col in gdf["column_name"]])
create_view_query = f"CREATE VIEW information_schema.{view_name.lower()} AS SELECT {select_columns} FROM INFORMATION_SCHEMA.{view_name};"
cursor.execute(create_view_query)
cursor.commit()
With the above I achieved the following
And with that all API methods work normally. π
Ofcourse, this is not at all a solution and rather a hack but I just wanted to share this in case someone else is looking for a temporary workaround.
Hi @csubhodeep! Thanks for digging into this further, and glad you were able to find a workaround. We just discussed this amongst the team just now, and we honestly don't have a much better solution at this time that wouldn't be equally hacky or have potential unintended side effects. Not sure if the support for case-insensitive collation will help here once released, as it sounds like it would affect EVERYTHING in the database (which could easily break existing logic).
At this point, the workaround might be the best path forward. Would also love to see if other people come across this issue and/or find your workaround helpful.
Full disclosure I work in the Fabric DW team at Microsoft, @csubhodeep is correct, this is due to the collation. You would see the same error if you tried this on a SQL Server in the same collation (Latin1_General_100_BIN2_UTF8). [TLRDR: the system tables/views are case sensitive too]
The good news, we are adding support for a case insensitive collation in Fabric DW, in Q3 this year. link
Hey all -- I'm going to take a look at doing this.
We built this backend against mcr.microsoft.com/mssql/server:2022-latest
and it defaults (or all MSSQL servers default?) to SQL_Latin1_General_CP1_CI_AS
.
Given that current users aren't having issues with that collation, if we switch our testing collation to be case-sensitive, it should (?) make things work with Fabric and other case-sensitive collations without breaking anyone using a case-insensitive collation.
Famous last words? I'll report back.
@gforsyth with my limited understanding of DB internals (and Fabric), I would like to ask a stupid question, does that mean ibis.mssql
backend would do multiple rounds of running all (DDL) queries one with different cases (like, for instance, if upper case fails try with lower case)?
@csubhodeep -- no, that would be an unacceptable trade-off.
What I mean is that if we update the mssql
backend to test against a case-sensitive collation (and if it works...), then we'll correctly refer to, say, INFORMATION_SCHEMA.SCHEMATA
and that will work on Fabric.
For users who were using a case-insensitive collation, it won't matter because SCHEMATA == schemata
Ah ok I see, thanks for the explanations π
@csubhodeep -- if you install from main
, Ibis should work on Fabric now, since we are now testing against the same collation (Latin1_General_100_BIN2_UTF8
)
@csubhodeep -- if you install from
main
, Ibis should work on Fabric now, since we are now testing against the same collation (Latin1_General_100_BIN2_UTF8
)
ok thanks a lot I would try it out at the earliest.
Currently I am facing some issues while logging into our Fabric workspace using my corporate account. I would try and give a feedback as soon as it is resolved. Meanwhile, @MarkPryceMaherMSFT it would be really nice if you (or someone from your team) could test this out. Thanks ! π
Hello @gforsyth , I have tested this and most of the API methods listed in this page works.
Although kind of unrelated to this thread, I would suggest, for the read_*
methods that have not been implemented yet, you could either mark them (with colors or symbols) or simply remove them till they are actually implemented.
Thanks a lot again for pushing this feature through. π
Is your feature request related to a problem?
NO.
What is the motivation behind your request?
xref #9454
Describe the solution you'd like
I would like to be able to interact with a typical Microsoft Fabric Warehouse, which has quite similar interfaces as any other Datawarehousing technologies that
ibis
already supports. It would be nice if we could continue using theibis.mssql
backend for it.What version of ibis are you running?
9.1.0
What backend(s) are you using, if any?
DuckDB & SQLServer
Code of Conduct