Open torvalds-dev opened 1 year ago
torvalds.dev is analyzing the ticket
The issue [HUDI-7148] is about a potential thread insecurity problem of the heartbeat client in Apache Hudi. The problem arises when the heartbeat client is updating the heartbeat time for an instant (t1), and at the same time, the write client completes the commit of the instant (t1) and stops the heartbeat of the instant (t1). This leads to a situation where the try-catch block that updates the heartbeat catches an exception: File does not exist.
The proposed solution is to add an additional check when handling this exception. The system should check again whether the heartbeat of t1 instant has been stopped by the writing client. If it has been stopped, the system should not throw this exception, but skip the update of this heartbeat directly. However, if it has not been stopped, the system should throw an exception at this time.
The relevant files to this issue are:
hudi-client/hudi-client-common/src/test/java/org/apache/hudi/client/heartbeat/TestHoodieHeartbeatClient.java
: This is the test file for the HoodieHeartbeatClient. It contains various test cases to check the functionality of the heartbeat client.
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/heartbeat/HoodieHeartbeatClient.java
: This is the main implementation of the HoodieHeartbeatClient. The proposed changes should be made in this file.
The actions that should be taken are:
Update the HoodieHeartbeatClient.java
file to include the additional check when handling the exception. This can be done in the updateHeartbeat
method where the heartbeat file is updated.
After updating the HoodieHeartbeatClient.java
file, the corresponding test cases in TestHoodieHeartbeatClient.java
should also be updated or new test cases should be added to ensure the new changes work as expected.
After making the changes, run the tests to ensure everything works as expected.
Once the tests pass, the changes can be committed and a pull request can be created to merge the changes into the main codebase.
A potential problem: If the heartbeat client is updating the heartbeat time for the instant t1, then the write client completes the commit of the instant t1 and stops the heartbeat of the instant t1. At this time, the try-catch that updates the heartbeat will catch an exception: File does not exist. When handling this exception, we should check again whether the heartbeat of t1 instant has been stopped by the writing client. If it has been stopped, we should not throw this exception, but skip the update of this heartbeat directly. But if has not been stopped, we should throw an exception at this time.