skyl / corpora

Corpora is a self-building corpus that can help build other arbitrary corpora
GNU Affero General Public License v3.0
2 stars 0 forks source link

refactor(rs/core/corpora_cli/context): modularize context #51

Closed skyl closed 4 days ago

skyl commented 4 days ago

PR Type

enhancement


Description


Changes walkthrough ๐Ÿ“

Relevant files
Enhancement
init.rs
Refactor initialization to use collector for tarball creation

rs/core/corpora_cli/src/commands/init.rs
  • Removed direct Git operations and replaced with collector usage.
  • Simplified tarball creation using ctx.collector.collect_tarball().
  • Updated write_corpus_id to use ctx.corpora_config.root_path.
  • +4/-60   
    client_secret.rs
    Add client secret-based authentication strategy                   

    rs/core/corpora_cli/src/context/auth/client_secret.rs
  • Introduced ClientSecretAuth for OAuth authentication.
  • Implemented AuthStrategy for client secret-based authentication.
  • +66/-0   
    mod.rs
    Introduce authentication strategy trait and error handling

    rs/core/corpora_cli/src/context/auth/mod.rs
  • Defined AuthStrategy trait for authentication.
  • Implemented error handling for authentication strategies.
  • Added function to select and retrieve bearer token.
  • +43/-0   
    git.rs
    Implement GitCollector for file collection and tarball creation

    rs/core/corpora_cli/src/context/collector/git.rs
  • Implemented GitCollector for collecting files in a Git repository.
  • Added methods to filter text files and create tarballs.
  • +102/-0 
    mod.rs
    Define Collector trait and initialize collectors                 

    rs/core/corpora_cli/src/context/collector/mod.rs
  • Defined Collector trait for file collection.
  • Added function to get appropriate collector based on configuration.
  • +48/-0   
    config.rs
    Add CorporaConfig struct and configuration loading             

    rs/core/corpora_cli/src/context/config.rs
  • Added CorporaConfig struct with additional fields.
  • Implemented function to load configuration from .corpora.yaml.
  • +53/-0   
    mod.rs
    Modularize context with authentication and collector integration

    rs/core/corpora_cli/src/context/mod.rs
  • Modularized context with authentication, collector, and configuration.
  • Integrated context initialization with new modules.
  • +43/-0   

    ๐Ÿ’ก PR-Agent usage: Comment /help "your question" on any pull request to receive relevant information

    github-actions[bot] commented 4 days ago

    PR Reviewer Guide ๐Ÿ”

    Here are some key observations to aid the review process:

    โฑ๏ธ Estimated effort to review: 4 ๐Ÿ”ต๐Ÿ”ต๐Ÿ”ต๐Ÿ”ตโšช
    ๐Ÿงช No relevant tests
    ๐Ÿ”’ Security concerns

    Sensitive information exposure:
    The `ClientSecretAuth` struct stores sensitive information like `client_id` and `client_secret` as plain strings. Consider using a more secure method to handle these credentials, such as environment variables or a secure vault.
    โšก Recommended focus areas for review

    Error Handling
    The error handling for tarball creation and corpus ID writing could be improved by providing more context or using a logging mechanism instead of just printing to stderr. Error Details
    The error handling in the `authenticate` function could be enhanced by providing more detailed error messages or logging for failed HTTP requests and JSON parsing. Command Execution
    The use of `Command::new("git")` for executing Git commands may fail silently if the Git binary is not available or if there are permission issues. Consider adding checks or more informative error messages.
    github-actions[bot] commented 4 days ago

    PR Code Suggestions โœจ

    Explore these optional code suggestions:

    CategorySuggestion                                                                                                                                    Score
    Security
    Avoid exposing sensitive information in error messages for security reasons ___ **Ensure that sensitive information such as client_secret is not logged or exposed in
    error messages to prevent potential security risks.** [rs/core/corpora_cli/src/context/auth/client_secret.rs [41]](https://github.com/skyl/corpora/pull/51/files#diff-f2789f898b08f1cc8a61bd47f13571be7f5e00d22da51d93c3d5749e0ce1b68aR41-R41) ```diff -.map_err(|e| AuthError::AuthenticationFailed(format!("HTTP request failed: {}", e)))?; +.map_err(|e| AuthError::AuthenticationFailed("HTTP request failed".to_string()))?; ```
    Suggestion importance[1-10]: 9 Why: The suggestion to avoid exposing sensitive information in error messages is crucial for security. It prevents potential leaks of sensitive data, such as client secrets, which could be exploited if logged or exposed.
    9
    Enhancement
    Enhance error messages for Git command failures to aid in debugging ___ **Consider using a more descriptive error message when the Git command fails to help
    with debugging.** [rs/core/corpora_cli/src/context/collector/git.rs [68]](https://github.com/skyl/corpora/pull/51/files#diff-027ebbb115dca33871bfbdaa101bc9b70ba5734f7b77a9cb7022c02a83281a11R68-R68) ```diff -return Err(format!("Git command failed with status {}", output.status).into()); +return Err(format!("Git command 'ls-files' failed with status {}: {:?}", output.status, output.stderr).into()); ```
    Suggestion importance[1-10]: 8 Why: Improving the error message for Git command failures by including the command and stderr output can significantly aid in debugging. This enhancement provides more context, making it easier to diagnose and resolve issues.
    8
    Improve error handling for tarball creation failures by providing more context or retrying ___ **Consider handling the case where ctx.collector.collect_tarball() returns an error
    more gracefully, possibly by providing more context or attempting a retry.** [rs/core/corpora_cli/src/commands/init.rs [12-15]](https://github.com/skyl/corpora/pull/51/files#diff-f464032a0fce74f27bdd1392052016915cccb793e821d99be68c5d830829d0d4R12-R15) ```diff let tarball_path = match ctx.collector.collect_tarball() { Ok(path) => path, Err(err) => { eprintln!("Failed to create tarball: {:?}", err); + // Additional error handling or retry logic could go here return; } ```
    Suggestion importance[1-10]: 7 Why: The suggestion to enhance error handling by providing more context or retrying can improve the robustness of the application. It addresses a potential issue where the current error handling is minimal, which could lead to a lack of clarity or missed opportunities for recovery.
    7
    Improve error handling for configuration loading to provide clearer guidance to users ___ **Add error handling for the load_config() function to provide a more user-friendly
    message if the configuration fails to load.** [rs/core/corpora_cli/src/context/mod.rs [20]](https://github.com/skyl/corpora/pull/51/files#diff-41b9c211f7b0be404e9d2653410776b6f936a15815d2a59a0b8d190a8eb5b916R20-R20) ```diff -let corpora_config = load_config().expect("Failed to load Corpora configuration"); +let corpora_config = load_config().unwrap_or_else(|| panic!("Failed to load Corpora configuration. Please ensure the .corpora.yaml file is present and correctly formatted.")); ```
    Suggestion importance[1-10]: 7 Why: The suggestion to improve error handling for configuration loading by providing a more user-friendly message can enhance user experience. It offers clearer guidance on what might be wrong, helping users to quickly identify and fix configuration issues.
    7