At this moment in time we've encountered two issues which seem to have the same root cause: the way Databricks/Spark setups its connections.
We had two incidents:
We used notebooks that were using the ADAL library last year (before it was deprecated). In the beginning of the notebook we declared the connection and fetched a active directory token for a service principal. This worked, however when running notebooks in a sequence, we would get a Token expired error while executing spark.reads or spark.writes. We saw that the token was not retrieved in the context of a notebook but in the driver/global context. This resulted in our Jobs/workflows resulting in errors. So we revererted back to password authentication.
We encountered a JDBC connection reset issue while executing updates (NonQueries) with the executeUpdate method within the driverManager Connection. There was strange behaviour where we opened the connection in the Spark notebook, but a minute later we would get errors when trying to execute a update in the same notebook. The error would be that the connection is reset. Resulting in issues for our jobs/workflows. We've implemented a workaround that when we executeUpdate, that we first check if the connection is still open and if not to open a new connection.
Connections to SQL seem not to be context driven, but defined in some other manner. Do you recognize this behaviour and do you have guidelines how to best work with this?
Dear sir or madam,
At this moment in time we've encountered two issues which seem to have the same root cause: the way Databricks/Spark setups its connections.
We had two incidents:
Connections to SQL seem not to be context driven, but defined in some other manner. Do you recognize this behaviour and do you have guidelines how to best work with this?