vaticle / typedb-driver-python

TypeDB Driver for Python
https://typedb.com
Apache License 2.0
67 stars 25 forks source link

Remove calls to `grpc.channel.close()`, which could cause segfaults #266

Closed alexjpwalker closed 1 year ago

alexjpwalker commented 1 year ago

What is the goal of this PR?

We no longer call close on any of our gRPC Channels. This fixes possible segfaults caused by resources being deallocated while they are still in use.

What are the changes implemented in this PR?

Users in a wide variety of scenarios had reported intermittent crashes, often accompanied by warnings saying "1 metadata element(s) leaked". The logs would look similar to the following:

7786| E1031 17:59:35.477338904 207 metadata.cc:253] WARNING: 1 metadata elements were leaked
7787| E1031 17:59:35.477409424 207 metadata.cc:260] mdelem ':authority' = 'typedb-cluster-1.typedb-cluster:1729'
7788| [mutex.cc : 435] RAW: Lock blocking 0x563cbf829fc0 @
7789| [mutex.cc : 1908] RAW: Check (v & (kMuWait | kMuWrWait)) != kMuWrWait failed: Lock: Mutex corrupt: waiting writer with no waiters: 0x563cbf815670

The same issue has also been reported in https://github.com/googleads/google-ads-python/issues/384, and a fix was suggested in:

From this issue we infer that Channel.close is not behaving nicely in gRPC Python, and can cause resources to be deallocated while they are still in use. As gRPC itself uses native C libraries, this results in segfaults and crashes. We determine that the best course of action is to not close the Channel ourselves.

We've simply deleted the 3 places in our code that called close on a gRPC Channel. It has passed our CI tests and fixed user-reported issues, and it is what the gRPC maintainers themselves appear to recommend in the linked grpc issue.

typedb-bot commented 1 year ago

PR Review Checklist

Do not edit the content of this comment. The PR reviewer should simply update this comment by ticking each review item below, as they get completed.


Code

Architecture