ploomber / jupysql

Better SQL in Jupyter. 📊
https://jupysql.ploomber.io
Apache License 2.0
719 stars 76 forks source link

Running SQL using SparkConnect should not print full stack trace #1011

Open b1ackout opened 6 months ago

b1ackout commented 6 months ago

What happens?

There are a lot of complains that the stack trace is really long and doesn't help identify the error. The solution would be to just print the error that sparkSQL provides.

E.G.

Running this query:

%%sql

ALTER TABLE my_table CHANGE COLUMN my_column STRING;

The stack trace should be:

Renaming column is not supported in Hive-style ALTER COLUMN, please run RENAME COLUMN instead.(line 1, pos 0)

== SQL ==
ALTER TABLE my_table CHANGE COLUMN my_column STRING;
^^^

To Reproduce

After connecting to a spark cluster, run the code below:

%%sql

ALTER TABLE my_table CHANGE COLUMN my_column STRING;

This should output something like this:

---------------------------------------------------------------------------
ParseException                            Traceback (most recent call last)
Cell In[1], line 1
----> 1 get_ipython().run_cell_magic('sql', '', '\nALTER TABLE my_table CHANGE COLUMN my_column STRING;\n')

File /opt/conda/lib/python3.10/site-packages/IPython/core/interactiveshell.py:2517, in InteractiveShell.run_cell_magic(self, magic_name, line, cell)
   2515 with self.builtin_trap:
   2516     args = (magic_arg_s, cell)
-> 2517     result = fn(*args, **kwargs)
   2519 # The code below prevents the output from being displayed
   2520 # when using magics with decorator @output_can_be_silenced
   2521 # when the last Python token in the expression is a ';'.
   2522 if getattr(fn, magic.MAGIC_OUTPUT_CAN_BE_SILENCED, False):

File /opt/conda/lib/python3.10/site-packages/sql/magic.py:369, in SqlMagic.execute(self, line, cell, local_ns)
    261 @no_var_expand
    262 @needs_local_scope
    263 @line_magic("sql")
   (...)
    341 )
    342 def execute(self, line="", cell="", local_ns=None):
    343     """
    344     Runs SQL statement against a database, specified by
    345     SQLAlchemy connect string.
   (...)
    367 
    368     """
--> 369     return self._execute(
    370         line=line, cell=cell, local_ns=local_ns, is_interactive_mode=False
    371     )

File /opt/conda/lib/python3.10/site-packages/ploomber_core/telemetry/telemetry.py:696, in Telemetry.log_call.<locals>._log_call.<locals>.wrapper(*args, **kwargs)
    694     injected_args = list(args)
    695     injected_args.insert(1, _payload)
--> 696     result = func(*injected_args, **kwargs)
    697 else:
    698     result = func(_payload, *args, **kwargs)

File /opt/conda/lib/python3.10/site-packages/ploomber_core/exceptions.py:128, in modify_exceptions.<locals>.wrapper(*args, **kwargs)
    125 @wraps(fn)
    126 def wrapper(*args, **kwargs):
    127     try:
--> 128         return fn(*args, **kwargs)
    129     except (ValueError, TypeError) as e:
    130         _add_community_link(e)

File /opt/conda/lib/python3.10/site-packages/sql/magic.py:628, in SqlMagic._execute(self, payload, line, cell, local_ns, is_interactive_mode)
    625     handle_exception(e, command.sql, self.short_errors)
    626 except Exception as e:
    627     # Handle non SQLAlchemy errors
--> 628     handle_exception(e, command.sql, self.short_errors)

File /opt/conda/lib/python3.10/site-packages/sql/error_handler.py:115, in handle_exception(error, query, short_error)
    113         _display_error_msg_with_trace(error, detailed_message)
    114 else:
--> 115     raise error

File /opt/conda/lib/python3.10/site-packages/sql/magic.py:582, in SqlMagic._execute(self, payload, line, cell, local_ns, is_interactive_mode)
    579     parameters = user_ns
    581 try:
--> 582     result = run_statements(conn, command.sql, self, parameters=parameters)
    584     if (
    585         result is not None
    586         and not isinstance(result, str)
   (...)
    589         # Instead of returning values, set variables directly in the
    590         # users namespace. Variable names given by column names
    592         if self.autopandas or self.autopolars:

File /opt/conda/lib/python3.10/site-packages/sql/run/run.py:54, in run_statements(conn, sql, config, parameters)
     50     result = handle_postgres_special(conn, statement)
     52 # regular query
     53 else:
---> 54     result = conn.raw_execute(statement, parameters=parameters)
     55     if is_spark(conn.dialect) and config.lazy_execution:
     56         return result.dataframe

File /opt/conda/lib/python3.10/site-packages/sql/connection/connection.py:1114, in SparkConnectConnection.raw_execute(self, query, parameters)
   1112 def raw_execute(self, query, parameters=None):
   1113     """Run the query without any pre-processing"""
-> 1114     return handle_spark_dataframe(self._connection.sql(query))

File /opt/conda/lib/python3.10/site-packages/pyspark/sql/connect/session.py:550, in SparkSession.sql(self, sqlQuery, args)
    548 def sql(self, sqlQuery: str, args: Optional[Union[Dict[str, Any], List]] = None) -> "DataFrame":
    549     cmd = SQL(sqlQuery, args)
--> 550     data, properties = self.client.execute_command(cmd.command(self._client))
    551     if "sql_command_result" in properties:
    552         return DataFrame.withPlan(CachedRelation(properties["sql_command_result"]), self)

File /opt/conda/lib/python3.10/site-packages/pyspark/sql/connect/client/core.py:982, in SparkConnectClient.execute_command(self, command)
    980     req.user_context.user_id = self._user_id
    981 req.plan.command.CopyFrom(command)
--> 982 data, _, _, _, properties = self._execute_and_fetch(req)
    983 if data is not None:
    984     return (data.to_pandas(), properties)

File /opt/conda/lib/python3.10/site-packages/pyspark/sql/connect/client/core.py:1282, in SparkConnectClient._execute_and_fetch(self, req, self_destruct)
   1279 schema: Optional[StructType] = None
   1280 properties: Dict[str, Any] = {}
-> 1282 for response in self._execute_and_fetch_as_iterator(req):
   1283     if isinstance(response, StructType):
   1284         schema = response

File /opt/conda/lib/python3.10/site-packages/pyspark/sql/connect/client/core.py:1263, in SparkConnectClient._execute_and_fetch_as_iterator(self, req)
   1261                     yield from handle_response(b)
   1262 except Exception as error:
-> 1263     self._handle_error(error)

File /opt/conda/lib/python3.10/site-packages/pyspark/sql/connect/client/core.py:1502, in SparkConnectClient._handle_error(self, error)
   1489 """
   1490 Handle errors that occur during RPC calls.
   1491 
   (...)
   1499 Throws the appropriate internal Python exception.
   1500 """
   1501 if isinstance(error, grpc.RpcError):
-> 1502     self._handle_rpc_error(error)
   1503 elif isinstance(error, ValueError):
   1504     if "Cannot invoke RPC" in str(error) and "closed" in str(error):

File /opt/conda/lib/python3.10/site-packages/pyspark/sql/connect/client/core.py:1538, in SparkConnectClient._handle_rpc_error(self, rpc_error)
   1536             info = error_details_pb2.ErrorInfo()
   1537             d.Unpack(info)
-> 1538             raise convert_exception(info, status.message) from None
   1540     raise SparkConnectGrpcException(status.message) from None
   1541 else:

ParseException: 
Renaming column is not supported in Hive-style ALTER COLUMN, please run RENAME COLUMN instead.(line 1, pos 0)

== SQL ==
ALTER TABLE my_table CHANGE COLUMN my_column STRING;
^^^

OS:

Linux

JupySQL Version:

0.10.10

Full Name:

Athanasios Keramas

Affiliation:

-

b1ackout commented 6 months ago

https://github.com/ploomber/jupysql/blob/043344400de468f5d340104c65ec78be5a217c7f/src/sql/connection/connection.py#L1114

import pyspark
try:
    return handle_spark_dataframe(self._connection.sql(query))
except pyspark.sql.utils.AnalysisException as e:
    print(e)
except Exception as e:
    print(e)
    raise (e)

Would this be a solution?

edublancas commented 6 months ago

did you try the short_errors option?

I remember the spark compatibility came from an external contributor so I'm unsure if the short_errors option will work, but if it doesn't, feel free to open a PR

b1ackout commented 5 months ago

did you try the short_errors option?

I remember the spark compatibility came from an external contributor so I'm unsure if the short_errors option will work, but if it doesn't, feel free to open a PR

@edublancas I tested it, doesn't work on sparkConnect, I'm gonna open a PR

b1ackout commented 5 months ago

Can anyone review the PR?