vesoft-inc / nebula-java

Client API and data importer of Nebula Graph in Java
Apache License 2.0
164 stars 121 forks source link

fix okhttpclient response body leak bug #600

Closed Nicole00 closed 3 weeks ago

Nicole00 commented 3 weeks ago

close https://github.com/vesoft-inc/nebula-java/issues/599

reproduce:

If a response has not been read or has not been read completely, then the connection will always be occupied, but in Java, the response(or we can say responseBody) will Automatic garbage collected by jvm. After 5 minutes, the automatic cleaning mechanism of the okhttp3 connection pool will find that the connection is occupied, but the newCall of the connection's weak reference is null. So it is considered that the current connection is leaked, so the log is printed and the connection is recycled.


# resolution
Rewrite the flush and read methods in http2Client. During flush, all the data streams of the response are taken out and given to a global buffer, and then close the response. During read, batches are read from the global buffer.

# tests
* Executed concurrently 20 times, the process will automatically exit after waiting for 5 minutes, and no log will appear. 
* Because the okhttpclient defined in the client is a singleton, and the connection pool of okhttp is maintained internally, in order to confirm that calling http2Client.close during the execution process does not affect the execution of the existing session, the following test was performed: (The result is that it does not affect the existing session. execution)
  * Temporarily modify the code logic: when an auth failure occurs when creating session, do connection.close uniformly, regardless of sessionPool.
  * The pool setting max is 20 and min is 10. 10 links will be built during initialization.
  * Run a single-threaded query and sleep for 10 seconds.
  * Manually delete the user during the executions.
  * When running 20 concurrent queries for each 20 concurrent operations, 10 user not exit errors occurred, and the other executions were normal.

# 复现场景:
* 并发20,每个并发内执行20次query。
* 执行完成后 进程退出前等待5分钟,便会出现日志“a connection to http:// was leadked. Did you forget to close a response body”。 
要等待5分钟才出现日志是因为okhttp client内有一个connection pool,pool的cleanup任务是默认5分钟执行一次。上述日志便是cleanup时发现 池子中的connection还在使用但引用者为空,便认定该connection 泄漏,于是会打印日志并将其清除。

okhttp3 采用一个connectionPool实现连接的复用,在使用连接池发送request请求时 每个连接会记录newCall的弱引用, 把newCall的弱引用记录在connection对象中。(也就是记录一下当前connection 是被newCall 引用的) 在okhttp3中只有request对应的response读取完毕才会将上面所提到的connection对象所记录的newCall弱引用移除。

如果一个response没有被读取或者没有被去读完,那么这个连接就一直被占用,但是在java中response没有被占用了便会 自动GC掉,5分钟后okhttp3 连接池的自动清理机制会发现connection处于被占用状态,但是connection的newCall的弱 引用为空,则认为当前connection发生了泄漏,于是打印日志并将connection回收掉。



# 解决:
重构http2Client中flush和read的方式,在flush时将response 的全部数据流取出给到一个全局buffer, read时 一批批的从全局buffer中读取。

# 测试:
* 并发20执行,进程等待5分钟后自动退出,无 日志出现。 
* 因为 client中定义的okhttpclient是个单例,且内部维护了okhttp的connection pool,为了确认在执行过程中调用http2Client.close 不影响已有session的执行,进行如下测试:(结果是 不影响已有session的执行)
  * 临时修改代码逻辑:当创建sessionn时发生auth失败,统一做connection.close,不关sessionPool。
  * 池子设置max是20,min是10.初始化的时候会建好10个链接。
  * 跑一个单线程的查询,sleep10 秒。
  * 中间手动把用户给删除。
  * 跑 20并发每个并发20次操作的查询,有10次 user not exit的 错误出现,其他的执行正常