y-scope / clp

Compressed Log Processor (CLP) is a free log management tool capable of compressing logs and searching the compressed logs without decompression.
https://yscope.com
Apache License 2.0
813 stars 68 forks source link

clp-core: Refactor the BufferedFileReader to wrap around a ReaderInterface. #523

Open haiqi96 opened 1 month ago

haiqi96 commented 1 month ago

Description

The BufferedFileReader was designed with the assumption that it always reads from the local file system. However, supporting compressing from a different input source, (s3 for example) would break this assumption. In this PR, we refactor the BufferedFileReader so it can takes in any abitrary reader interface as its data source. A more detailed list of changes:

  1. Refactored BufferedFileReader to use RAII model.
  2. Remove the internal FileDescriptor and C-style read/write from BufferedFileReader
  3. Update the instantiation of all BufferedFileReader instance to use a FileDescriptorReader (instead of FileReader) to avoid double buffering.
  4. Rewrite some BufferedFileReader apis using modern C++ features.

Note: this PR makes some modifications to the unit-test, but we decide to not update the unit-test to match the latest coding standard. In fact, the unit-test is very poorly written and would make more sense to refactor it in a separate PR.

Validation performed

1.Made sure exisitng Unit test passes. 2.Added a new unit test to cover the case where the BufferedFileReader takes in an reader interface with non-zero pos.

  1. Compressed and decompressed a single file, verified the decompressed file matches original