y-scope / clp

Compressed Log Processor (CLP) is a free log management tool capable of compressing logs and searching the compressed logs without decompression.
https://yscope.com
Apache License 2.0
871 stars 70 forks source link

clp-s: Add the write path for single-file archive #563

Open wraymo opened 3 weeks ago

wraymo commented 3 weeks ago

Description

This PR adds support for writing clp-s single file archives in accordance with the SFA spec. This is accomplished by first compressing a multi-file archive as normal then combining everything together into a single archive.

Validation performed

Summary by CodeRabbit

Release Notes

coderabbitai[bot] commented 3 weeks ago

Walkthrough

The changes in this pull request focus on enhancing the functionality of the ArchiveWriter and related classes, particularly in handling single-file archives. Key updates include the introduction of new member variables and methods, modifications to existing method signatures, and improvements to command line argument parsing. The TimestampDictionaryWriter class has also been restructured to streamline its operations. Additionally, a new file defining structures for single-file archives has been added, along with updates to related classes and methods to support these enhancements.

Changes

File Change Summary
components/core/src/clp_s/ArchiveWriter.cpp Added member variable m_single_file_archive, modified close method to differentiate between single and multi-file archives, added write_timestamp_dict method, updated store_tables return type to std::pair<size_t, size_t>.
components/core/src/clp_s/ArchiveWriter.hpp Added bool single_file_archive to ArchiveWriterOption, updated store_tables return type, added methods for single-file archive handling.
components/core/src/clp_s/CommandLineArguments.cpp Introduced single-file-archive option in command line argument parsing.
components/core/src/clp_s/CommandLineArguments.hpp Added member variable m_single_file_archive and getter method get_single_file_archive().
components/core/src/clp_s/JsonParser.cpp Added single_file_archive to m_archive_options structure in constructor.
components/core/src/clp_s/JsonParser.hpp Added bool single_file_archive to JsonParserOption struct.
components/core/src/clp_s/SingleFileArchiveDefs.hpp Introduced definitions and structures for managing single-file archives, including ArchiveHeader, ArchiveCompressionType, and related structures.
components/core/src/clp_s/TimestampDictionaryWriter.cpp Replaced write_and_flush_to_disk with write, added clear and size_in_bytes methods, removed open and close methods.
components/core/src/clp_s/TimestampDictionaryWriter.hpp Updated constructor and method signatures, removed file management methods, added write, clear, and size_in_bytes methods.
components/core/src/clp_s/archive_constants.hpp Added constant cArchiveFile for archive file path.
components/core/src/clp_s/clp-s.cpp Modified compress function to include single_file_archive parameter.
components/core/src/clp_s/TimestampEntry.hpp Added method size_in_bytes() to calculate size of TimestampEntry object.

Possibly related PRs

Suggested reviewers


📜 Recent review details **Configuration used: CodeRabbit UI** **Review profile: CHILL**
📥 Commits Reviewing files that changed from the base of the PR and between eaa09825f9c2c94f94e241c58d6cfed8572b8568 and b46d4c1db42080fbd3b1bb47638c07f4dd964d15.
📒 Files selected for processing (2) * `components/core/src/clp_s/JsonParser.cpp` (1 hunks) * `components/core/src/clp_s/clp-s.cpp` (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2) * components/core/src/clp_s/JsonParser.cpp * components/core/src/clp_s/clp-s.cpp

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share - [X](https://twitter.com/intent/tweet?text=I%20just%20used%20%40coderabbitai%20for%20my%20code%20review%2C%20and%20it%27s%20fantastic%21%20It%27s%20free%20for%20OSS%20and%20offers%20a%20free%20trial%20for%20the%20proprietary%20code.%20Check%20it%20out%3A&url=https%3A//coderabbit.ai) - [Mastodon](https://mastodon.social/share?text=I%20just%20used%20%40coderabbitai%20for%20my%20code%20review%2C%20and%20it%27s%20fantastic%21%20It%27s%20free%20for%20OSS%20and%20offers%20a%20free%20trial%20for%20the%20proprietary%20code.%20Check%20it%20out%3A%20https%3A%2F%2Fcoderabbit.ai) - [Reddit](https://www.reddit.com/submit?title=Great%20tool%20for%20code%20review%20-%20CodeRabbit&text=I%20just%20used%20CodeRabbit%20for%20my%20code%20review%2C%20and%20it%27s%20fantastic%21%20It%27s%20free%20for%20OSS%20and%20offers%20a%20free%20trial%20for%20proprietary%20code.%20Check%20it%20out%3A%20https%3A//coderabbit.ai) - [LinkedIn](https://www.linkedin.com/sharing/share-offsite/?url=https%3A%2F%2Fcoderabbit.ai&mini=true&title=Great%20tool%20for%20code%20review%20-%20CodeRabbit&summary=I%20just%20used%20CodeRabbit%20for%20my%20code%20review%2C%20and%20it%27s%20fantastic%21%20It%27s%20free%20for%20OSS%20and%20offers%20a%20free%20trial%20for%20proprietary%20code)
🪧 Tips ### Chat There are 3 ways to chat with [CodeRabbit](https://coderabbit.ai): - Review comments: Directly reply to a review comment made by CodeRabbit. Example: - `I pushed a fix in commit , please review it.` - `Generate unit testing code for this file.` - `Open a follow-up GitHub issue for this discussion.` - Files and specific lines of code (under the "Files changed" tab): Tag `@coderabbitai` in a new review comment at the desired location with your query. Examples: - `@coderabbitai generate unit testing code for this file.` - `@coderabbitai modularize this function.` - PR comments: Tag `@coderabbitai` in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples: - `@coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.` - `@coderabbitai read src/utils.ts and generate unit testing code.` - `@coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.` - `@coderabbitai help me debug CodeRabbit configuration file.` Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. ### CodeRabbit Commands (Invoked using PR comments) - `@coderabbitai pause` to pause the reviews on a PR. - `@coderabbitai resume` to resume the paused reviews. - `@coderabbitai review` to trigger an incremental review. This is useful when automatic reviews are disabled for the repository. - `@coderabbitai full review` to do a full review from scratch and review all the files again. - `@coderabbitai summary` to regenerate the summary of the PR. - `@coderabbitai resolve` resolve all the CodeRabbit review comments. - `@coderabbitai configuration` to show the current CodeRabbit configuration for the repository. - `@coderabbitai help` to get help. ### Other keywords and placeholders - Add `@coderabbitai ignore` anywhere in the PR description to prevent this PR from being reviewed. - Add `@coderabbitai summary` to generate the high-level summary at a specific location in the PR description. - Add `@coderabbitai` anywhere in the PR title to generate the title automatically. ### CodeRabbit Configuration File (`.coderabbit.yaml`) - You can programmatically configure CodeRabbit by adding a `.coderabbit.yaml` file to the root of your repository. - Please see the [configuration documentation](https://docs.coderabbit.ai/guides/configure-coderabbit) for more information. - If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: `# yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json` ### Documentation and Community - Visit our [Documentation](https://docs.coderabbit.ai) for detailed information on how to use CodeRabbit. - Join our [Discord Community](http://discord.gg/coderabbit) to get help, request features, and share feedback. - Follow us on [X/Twitter](https://twitter.com/coderabbitai) for updates and announcements.
gibber9809 commented 3 weeks ago

Nice work! Seems mostly good for a draft implementation.

Main things we should change quickly is putting the archive header + metadata section into the regular multi-file archive, and also formally pick a magic number + change the magic number to 4 bytes.

There are other things we need to clean up/think about before actually merging this, but the above should changes should be enough to build off of for prototyping.

Also need to go through and fix all of the fields that are a different size than what the spec specifies.