Closed WorkingChen closed 2 months ago
addDeleteSegmentsSession
is adding a DeleteSegmentsSession
which will delete files one by one in a duty cycle. However what that method is also doing is renaming the files to deleted. I guess the rename operation might indeed take a very long time in some cases.
addDeleteSegmentsSession
is adding aDeleteSegmentsSession
which will delete files one by one in a duty cycle. However what that method is also doing is renaming the files to deleted. I guess the rename operation might indeed take a very long time in some cases.
In the scenario of deleting cluster Raft log data, if the client’s business volume is very large, the Raft log will also occupy a significant amount of disk space. When performing the purgeSegments operation on the Raft recording, many segment files will need to be deleted at the same time. Operations like rename and delete, which modify data, are time-consuming, especially when renaming thousands of files in a single method, which could take tens of seconds. During this time, the archive conductor cannot handle other tasks.
If you want to retrieve metadata for a large number of files (such as using File.exists()), this is acceptable. However, performing write operations on a large number of files (such as rename and delete) can be time-consuming because modifying data requires acquiring locks, and on mechanical drives, the cost is even higher. Therefore, my suggestion is to be cautious with I/O write operations, especially when handling a large number of such operations within a single method.
io.aeron.archive.ArchiveConductor#deleteSegments
When there are a large number of files to delete, executing this method may take over 10 seconds, which could cause the Aeron objects inside the archive to time out, leading to the archive shutting down.
The duration of extensive I/O operations is unpredictable. Should we consider moving this part of the work into the SessionWork and processing only a portion at a time?