mlverse / pysparklyr

Extension to {sparklyr} that allows you to interact with Spark & Databricks Connect
https://spark.posit.co/deployment/databricks-connect.html
Other
14 stars 3 forks source link

databricks_dbr_error not coping with unexpected error format (reticulate? pyspark?) #123

Closed Obihoernchen80 closed 2 months ago

Obihoernchen80 commented 3 months ago

When connecting to databricks from R I get the following unhelpful error:

First:

! Changing host URL to: [.....]
✔ Cluster: [.....]  | DBR: '14.3' [254ms]
✔ Python environment: 'r-sparklyr-databricks-14.3' [53ms]
✔ Connected to: '[.....]' [80ms]

Finally

Error in if (grepl("UNAVAILABLE", status_error)) { : 
  argument is of length zero

debugging into databricks_dbr_error shows that the actual error is as follows, which is not processed properly by databricks_dbr_error https://github.com/mlverse/pysparklyr/blob/b3b91eabd5e045eaff358f0a9d0349fa5e47bd3e/R/databricks-utils.R#L190

"Error in py_get_attr(x, name) : \n pyspark.errors.exceptions.connect.SparkConnectGrpcException: (org.apache.spark.SparkSQLException) [INVALID_HANDLE.SESSION_CLOSED] The handle 8936cdaa-4503-4413-864b-be979c4fc193 is invalid. Session was closed. SQLSTATE: HY000\n\033[90mRun \033]8;;rstudio:run:reticulate::py_last_error()\a reticulate::py_last_error() \033]8;;\a for details.\033[39m\n"

See also: https://github.com/sparklyr/sparklyr/issues/3449

Potential fix: If it is not possible to support each and every error message from each and every circumstance, it would at least be helpful if the original error message would be returned instead of erroring out when parsing the original error message.

Obihoernchen80 commented 2 months ago

@edgararuiz Thank you very much!