ploomber / jupysql

Better SQL in Jupyter. 📊
https://jupysql.ploomber.io
Apache License 2.0
728 stars 77 forks source link

Issue on page /integrations/duckdb.html #713

Open dmcmurchy opened 1 year ago

dmcmurchy commented 1 year ago

Would be helpful to include instructions for connecting to a persistent duckdb database. The create _engine option works but has some caveats with respect to the path. Under Windows if the path has a backslash, hyphen or space in the path to the database file it'll need to be unicode escaped on the creation. Closing the connection afterwards at least with jupysql 0.7.9 isn't possible unless the connection is aliased.

Example 1

Where the file is not in the same directory as the notebook

from sqlalchemy.engine import create_engine

engine = create_engine(r"duckdb:///C:\Users\dwayn\Documents\DataCamp - Career Track - Data Analyst in SQL\9 - Exploratory Data Analysis in SQL\Course Datasets\stackoverflow.duckdb")

%sql engine --alias one

%sql -l

%sql --close one

Example 2

Where the notebook and database file exist in the same directory %sql duckdb:///stackoverflow.duckdb

%sql -l

%sql --close duckdb:///stackoverflow.duckdb

edublancas commented 1 year ago

hey @Zirescu thanks for your feedback!

The create _engine option works but has some caveats with respect to the path.

can you expand on the caveats?

Under Windows if the path has a backslash, hyphen or space in the path to the database file it'll need to be unicode escaped on the creation.

is this the caveat that you're mentioning? I'm unsure if you're referring to the create_engine approach or the %sql duckdb:///path/to.db approach

dmcmurchy commented 1 year ago

The caveat is with respect to example 1 where the Windows path contains Unicode characters (backslashes) so the user will need to either prefix the string with 'r' so that it gets treated as a raw string or escape the backslashes. Other implementations (magic_duckdb) didn't require the escaping.

A working connection string using 'r' prefix: engine = create_engine(r"duckdb:///C:\Users\dwayn\Documents\DataCamp - Career Track - Data Analyst in SQL\9 - Exploratory Data Analysis in SQL\Course Datasets\stackoverflow.duckdb")

A working connection string with escaped backslashes: engine = create_engine("duckdb:///C:\\Users\\dwayn\\Documents\\DataCamp - Career Track - Data Analyst in SQL\\9 - Exploratory Data Analysis in SQL\\Course Datasets\\stackoverflow.duckdb")

Sample error message if the string isn't escaped: SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 12-13: truncated \UXXXXXXXX escape

The standard duckdb.connect method also requires the raw string or escaping (undocumented).

A working magic_duckdb connection: %dql -cn "C:\Users\dwayn\Documents\DataCamp - Career Track - Data Analyst in SQL\9 - Exploratory Data Analysis in SQL\Course Datasets\stackoverflow.duckdb"

The %sql approach didn't work with a path in any of my testing under Windows. It only worked when the database file was in the same directory as the notebook.

dmcmurchy commented 1 year ago

Here is the command and error message I get when trying the %sql connection option (using the raw string also fails). Command:

%sql duckdb:///"C:\Users\dwayn\Documents\DataCamp - Career Track - Data Analyst in SQL\9 - Exploratory Data Analysis in SQL\Course Datasets\stackoverflow.duckdb"

Error message:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[25], line 1
----> 1 get_ipython().run_line_magic('sql', 'duckdb:///"C:\\Users\\dwayn\\Documents\\DataCamp - Career Track - Data Analyst in SQL\\9 - Exploratory Data Analysis in SQL\\Course Datasets\\stackoverflow.duckdb"')

File [~\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCache\local-packages\Python311\site-packages\IPython\core\interactiveshell.py:2417](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/dwayn/Documents/JupyterNotebooks/~/AppData/Local/Packages/PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0/LocalCache/local-packages/Python311/site-packages/IPython/core/interactiveshell.py:2417), in InteractiveShell.run_line_magic(self, magic_name, line, _stack_depth)
   2415     kwargs['local_ns'] = self.get_local_scope(stack_depth)
   2416 with self.builtin_trap:
-> 2417     result = fn(*args, **kwargs)
   2419 # The code below prevents the output from being displayed
   2420 # when using magics with decodator @output_can_be_silenced
   2421 # when the last Python token in the expression is a ';'.
   2422 if getattr(fn, magic.MAGIC_OUTPUT_CAN_BE_SILENCED, False):

File [~\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCache\local-packages\Python311\site-packages\sql\magic.py:318](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/dwayn/Documents/JupyterNotebooks/~/AppData/Local/Packages/PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0/LocalCache/local-packages/Python311/site-packages/sql/magic.py:318), in SqlMagic.execute(self, line, cell, local_ns)
    210 @no_var_expand
    211 @needs_local_scope
    212 @line_magic("sql")
   (...)
    290 )
    291 def execute(self, line="", cell="", local_ns=None):
    292     """
    293     Runs SQL statement against a database, specified by
    294     SQLAlchemy connect string.
   (...)
...
--> 546     result = conn.session.execute(statement)
    547     _commit(conn=conn, config=config, manual_commit=manual_commit)
    549     if result and config.feedback:

AttributeError: 'NoneType' object has no attribute 'execute'
edublancas commented 1 year ago

Note: this might be related to https://github.com/ploomber/jupysql/issues/425