jwills / buenavista

A Postgres Proxy Server in Python
Apache License 2.0
201 stars 20 forks source link

Cannot connect to PowerBi #12

Open dunxverse opened 1 year ago

dunxverse commented 1 year ago

When I try connnect to Power BI, I haved to provide username, password, then I gave it admin:admin, but Power Bi refused with this message: Unable to connect. We encounted an error while trying to connect. Detailed: "PostgresSQL: No password has been provided but the backend requires one (in SASL/SCRAM-SHA-256)"

jwills commented 1 year ago

Yeah, I haven’t implemented any of the Postgres auth protocols yet; it’s very much on the TODO list tho!

jwills commented 1 year ago

okay yay I finally got around to adding auth (it's MD5 instead of the SHA256 stuff but hoping that PowerBI will forgive me)

https://github.com/jwills/buenavista/pull/17

actuary87 commented 1 year ago

The project is great. I managed to add default user and password of "postgres" by adding the following line:

auth = {"postgres": "postgres"}

and modified the following line in buenavista.examples.duckdb_postgres.py file: from: server = create(db, address) to: server = create(db, address, auth)

then ran "pip install ." in the modified source code. Great, PowerBI accepts to connect but the 2nd problem I am facing is that PowerBI forces me to supply a Database name which it's expected to be blank. How can I solve this?

Moreover, I was only able to connect using psql in cmd, even if I supply a database name (which I think the code ignores). If I supply a wrong user/password (other than the default one I modified above), the connection is refused.

Neither PowerBI (Postgresql connector) or PG Admin were working.

It would be really nice if you can help.

If PowerBI works, then hopefully DirectQuery will work on duckdb files through the proxy.

jwills commented 1 year ago

@actuary87 hey, happy to. First, there is a database on the server side (it's either "memory" if you're using it without a DuckDB file, or the name of that file (e.g. if your database file is at the path "/path/to/jaffle_shop.duckdb", the database will be called "jaffle_shop")

Second, I've been focusing my BI tooling efforts more on the Presto/Trino protocol (which lives in the http module and has an example runner here: https://github.com/jwills/buenavista/blob/main/buenavista/examples/duckdb_http.py ) b/c supporting the full pg_catalog is really difficult to do whereas Presto's metadata layer is comparably simple; it's not perfect yet either, but I can make it work with DBeaver, Apache Superset, the various presto CLI tools, and I'm working on getting DataGrip to work-- does PowerBI support Presto/Trino as a connection target?

~I'm on a Mac and it looks like PowerBI supports Macs now, which is good? I'm happy to take it out for a spin and see what it takes to get it working ala DataGrip.~ nope nm

I'm also curious how you came across this project-- I've talked about it here and there, but I'm not promoting it widely as it is (as you can tell) still very much under construction.

actuary87 commented 1 year ago

@actuary87 hey, happy to. First, there is a database on the server side (it's either "memory" if you're using it without a DuckDB file, or the name of that file (e.g. if your database file is at the path "/path/to/jaffle_shop.duckdb", the database will be called "jaffle_shop")

Second, I've been focusing my BI tooling efforts more on the Presto/Trino protocol (which lives in the http module and has an example runner here: https://github.com/jwills/buenavista/blob/main/buenavista/examples/duckdb_http.py ) b/c supporting the full pg_catalog is really difficult to do whereas Presto's metadata layer is comparably simple; it's not perfect yet either, but I can make it work with DBeaver, Apache Superset, the various presto CLI tools, and I'm working on getting DataGrip to work-- does PowerBI support Presto/Trino as a connection target?

~I'm on a Mac and it looks like PowerBI supports Macs now, which is good? I'm happy to take it out for a spin and see what it takes to get it working ala DataGrip.~ nope nm

I'm also curious how you came across this project-- I've talked about it here and there, but I'm not promoting it widely as it is (as you can tell) still very much under construction.

I don't know what PowerBI supports. I am not very technical.

I use DuckDB and I wanted to use it with PowerBI but not through ODBC. I want PowerBI to think it's Postgresql because then I can use DirectQuery which supports Postgresql but not ODBC. Analytical queries run much faster on DuckDB, that's why.

I've been searching for man in middle solutions which intercepts queries and redirect them to DB solutions other than Postgresql.

I encountered few projects before yours. But I think yours is exactly what I wanted.

I found it via Google.

actuary87 commented 11 months ago

Hello,

Your project is great. I actually managed to connect with Power BI after slight code modification.

I'm facing a situation where the data comes in just fine if you use any Power BI widget (like a table/matrix would show text, numbers and dates correctly). But the Power Query does display an error for numbers and dates.

I tried comparing the packets between a real Postgres server vs Yours (Postgres proxy). What I found out is that the numbers for example are sent by the real server as a 32-bit big endian integer (probably signed) whereas your implementation sends as text. So in Wireshark I see the number that real servers sends must be read in hex (because data is shown as binary represented in hex) while your implementation is showing in Wireshark as plain text. If for example a row holds the number 1,000 the real server shows as 00 00 00 003 E8 but buenavista shows it as 1000 in plain sight. I hope I could've explained the observation in a clear way.

I tried modifying the code inside send_data_rows function in buenavista/postgres.py but all attempts failed (Power Query displays an error that the format of a field is wrong).

I really would appreciate if you can comment whether my observation is correct and appreciate if you can also support me in resolving the issue. We can meet on chatting platform if that's fine.

Don't get me wrong, psql and other clients do recognize the column type in case of an integer or a date. It is just that Power Query does not recognize them as it should. (also the same when trying to connect with new tables from the database).

Thanks

jwills commented 11 months ago

@actuary87 for someone who describes themselves as "not very technical," that is a fantastic analysis of what is going on-- thank you so much!

What you said about the text vs. binary protocol makes sense; I am being extremely lazy in send_row_description and in send_data_rows by assuming that the desired format of the returned value is always text (which, as you observed, works fine with e.g. psql and many other clients.)

Now you have me curious as to how easy this would be to change; let me throw up a branch where I make some of the changes that would be necessary and see if I can make it work with e.g. psql.

jwills commented 11 months ago

okay I've read the relevant doc section and I think I see what needs doing

jwills commented 11 months ago

@actuary87 so I think something like this will work, tho I likely don't have all of the binary format details for e.g. datetimes correct yet: https://github.com/jwills/buenavista/pull/23

actuary87 commented 11 months ago

Thanks for your prompt response. I forked your project and I re made my code changes in a little bit more cleaner way than I initially did. I propose you visit the fork, read the README.md and try to run the Proxy and Power BI. You can separately view the content of the small duckdb file I provided. I hope you like it and I am more than happy to have your feedback.

You can also open any of the tables in Power Query and you see the errors I mentioned.

actuary87 commented 8 months ago

Hello my friend.

I deleted my old dev branch and pushed a fresh one. I merged the new one with your main (from my main) and made small changes.

I tested these DuckDB data types:

Working: BIGINT BIT BOOLEAN DATE DOUBLE DECIMAL INTEGER REAL (works but not accurate; getting some rounding difference!) SMALLINT TINYINT UUID VARCHAR

Not working (need help): TIME TIMESTAMP TIMESTAMP WITH TIME ZONE

Not tested: BLOB INTERVAL

Not supported by Postgresql: HUGEINT UBIGINT UINTEGER USMALLINT UTINYINT

You can download my dev and test yourself in Power BI if you are interested.

And thanks for all the effort you put so far.

jwills commented 8 months ago

@actuary87 ah, that’s excellent! I’m actually in Asia right now on vacation, but will take a look when I get back to my hotel later!

actuary87 commented 8 months ago

no worries. enjoy you vacation.

jwills commented 7 months ago

@actuary87 hello! I am back in the US and want to get this timestamp stuff working this week

actuary87 commented 7 months ago

What I don't understand is the timestamp value seems correct (I even verified it manually) but Power BI does not display it correctly. It either displayed a blank or 01/01/2000 00:00

jwills commented 7 months ago

So interesting; I cannot for the life of me figure out what is making it unhappy with the timestamp values