Using databricks-connect isn't always optimal for initializing Spark sessions in some use cases. This can be problematic for users who do not wish to use databricks-connect or prefer using spark-connect.
Context
This change is important because databricks-connect can sometimes cause issues, such as those described in this community discussion. These issues can disrupt workflows and create unnecessary complications.
Providing an option to choose whether to use databricks-connect or a regular Spark session would allow users to avoid these issues and use their preferred method for Spark session initialization.
Modify the _get_spark() function to allow users to specify their preference for databricks-connect, spark-connect, or a regular Spark session through configuration settings.
Another alternative could be to provide separate functions for initializing a Databricks session and a regular Spark session, and allow users to explicitly choose which function to call.
Description
Using
databricks-connect
isn't always optimal for initializing Spark sessions in some use cases. This can be problematic for users who do not wish to usedatabricks-connect
or prefer usingspark-connect
.Context
databricks-connect
can sometimes cause issues, such as those described in this community discussion. These issues can disrupt workflows and create unnecessary complications.databricks-connect
or a regular Spark session would allow users to avoid these issues and use their preferred method for Spark session initialization._get_spark()
function is used in some datasetsPossible Alternatives
_get_spark()
function to allow users to specify their preference fordatabricks-connect
,spark-connect
, or a regular Spark session through configuration settings.