skyl / corpora

Corpora is a self-building corpus that can help build other arbitrary corpora
GNU Affero General Public License v3.0
2 stars 0 forks source link

feat(rust): establish rust codebase for fully static binaries #44

Closed skyl closed 6 days ago

skyl commented 6 days ago

PR Type

enhancement, tests, configuration changes


Description


Changes walkthrough πŸ“

Relevant files
Enhancement
5 files
genall.sh
Add script for generating Rust client from OpenAPI spec   

rs/genall.sh
  • Added script to generate Rust client library from OpenAPI spec.
  • Included steps for fetching OpenAPI spec, generating code, and
    cleaning up.
  • Integrated Rust formatting and testing commands.
  • +40/-0   
    configuration.rs
    Implement configuration struct for Rust API client             

    rs/core/corpora_client/src/apis/configuration.rs
  • Added configuration struct for API client settings.
  • Implemented default configuration with localhost base path.
  • +48/-0   
    corpus_api.rs
    Add corpus management functions to Rust API client             

    rs/core/corpora_client/src/apis/corpus_api.rs
  • Added functions for managing corpora via API.
  • Implemented error handling for API responses.
  • +344/-0 
    corpus_response_schema.rs
    Define corpus response schema for Rust client                       

    rs/core/corpora_client/src/models/corpus_response_schema.rs
  • Defined schema for corpus response.
  • Included fields for ID, name, and timestamps.
  • +48/-0   
    lib.rs
    Initialize main library file for Rust client                         

    rs/core/corpora_client/src/lib.rs
  • Introduced main library file for Rust client.
  • Included module declarations for APIs and models.
  • +11/-0   
    Configuration changes
    4 files
    ci-rust.yml
    Add GitHub Actions workflow for Rust CI                                   

    .github/workflows/ci-rust.yml
  • Introduced a new GitHub Actions workflow for Rust CI.
  • Configured steps for building, formatting, linting, and testing Rust
    code.
  • +48/-0   
    setup.sh
    Update devcontainer setup with Rust and shell configurations

    .devcontainer/setup.sh
  • Added Rust to PATH in development container setup.
  • Updated zsh history and alias configurations.
  • +5/-1     
    docker-compose.yaml
    Update Docker Compose services and configurations               

    docker-compose.yaml
  • Modified service configurations for Docker Compose.
  • Added new celery and interactive services.
  • +22/-19 
    devcontainer.json
    Update DevContainer configuration for interactive service

    .devcontainer/devcontainer.json
  • Updated VSCode extensions and service settings.
  • Changed service from app to interactive.
  • +5/-2     
    Tests
    1 files
    test_corpus.py
    Enhance test with file operation verification                       

    py/packages/corpora_cli/commands/test_corpus.py
  • Added mock for file opening in test.
  • Verified file write operation in test assertions.
  • +8/-3     
    Documentation
    1 files
    rust-setup.md
    Document Rust workspace structure and client setup             

    md/notes/rust-setup.md
  • Added documentation for Rust workspace structure.
  • Explained client setup and regeneration process.
  • +71/-0   

    πŸ’‘ PR-Agent usage: Comment /help "your question" on any pull request to receive relevant information

    github-actions[bot] commented 6 days ago

    PR Reviewer Guide πŸ”

    (Review updated until commit https://github.com/skyl/corpora/commit/91b54853dbcb10b2774bdc060ee3231261b80a6d)

    Here are some key observations to aid the review process:

    ⏱️ Estimated effort to review: 4 πŸ”΅πŸ”΅πŸ”΅πŸ”΅βšͺ
    πŸ§ͺ PR contains tests
    πŸ”’ Security concerns

    Sensitive information exposure:
    The `SECRET_KEY` in `settings.py` is hardcoded and should be secured using environment variables or a secrets manager.
    ⚑ Recommended focus areas for review

    Code Smell
    The commented-out code and TODO comments in `views.py` suggest incomplete or temporary changes. Consider removing or addressing these before merging. Code Smell
    The `base_url` variable is imported but not used. Consider removing the import if it's unnecessary. Possible Bug
    The default `base_path` is set to "http://localhost". Ensure this is the intended default or consider making it configurable.
    github-actions[bot] commented 6 days ago

    PR Code Suggestions ✨

    Latest suggestions up to 91b5485 Explore these optional code suggestions:

    CategorySuggestion                                                                                                                                    Score
    Possible bug
    Ensure the history directory exists before appending to the history file to prevent errors ___ **Ensure that the directory /home/vscode/.corpora.zsh_history exists before appending
    to the .zsh_history file to prevent potential errors.** [.devcontainer/setup.sh [4]](https://github.com/skyl/corpora/pull/44/files#diff-aaf9e7764a12876cbe70d492c719e21e8590390a38d0f037193a765df858fdbfR4-R4) ```diff -echo 'autoload -Uz add-zsh-hook; append_history() { fc -W }; add-zsh-hook precmd append_history; export HISTFILE=/home/vscode/.corpora.zsh_history/.zsh_history' >> ~/.zshrc +mkdir -p /home/vscode/.corpora.zsh_history && echo 'autoload -Uz add-zsh-hook; append_history() { fc -W }; add-zsh-hook precmd append_history; export HISTFILE=/home/vscode/.corpora.zsh_history/.zsh_history' >> ~/.zshrc ```
    Suggestion importance[1-10]: 9 Why: This suggestion is highly relevant as it prevents potential runtime errors by ensuring the directory exists before writing to a file. It enhances the robustness of the script.
    9
    Possible issue
    Avoid variable shadowing by renaming the imported base_url to prevent conflicts ___ **Ensure that the base_url variable is not shadowing the imported base_url from the
    openai module to avoid potential conflicts.** [py/packages/corpora_cli/auth.py [3]](https://github.com/skyl/corpora/pull/44/files#diff-4d6ab61ebfe8649e2a5f322f7de82b2ef4efcda2685d0297d4e30b2cc8cd62ebR3-R3) ```diff -from openai import base_url +from openai import base_url as openai_base_url ... base_url = config.get("base_url", "http://app:8877") ```
    Suggestion importance[1-10]: 8 Why: The suggestion addresses a potential issue of variable shadowing, which can lead to bugs or unexpected behavior. Renaming the imported `base_url` is a good practice to prevent conflicts and improve code clarity.
    8
    Enhance error handling for network request execution failures ___ **Add error handling for the execute method call to manage potential network or
    request execution failures more effectively.** [rs/core/corpora_client/src/apis/plan_api.rs [48]](https://github.com/skyl/corpora/pull/44/files#diff-316ea7a81c26606b9c34c8f45eb24f9191c990353b675df780e1f610299f03c1R48-R48) ```diff -let local_var_resp = local_var_client.execute(local_var_req)?; +let local_var_resp = local_var_client.execute(local_var_req).map_err(Error::from)?; ```
    Suggestion importance[1-10]: 7 Why: The suggestion improves error handling by mapping potential errors from the `execute` method call, which is crucial for managing network or request execution failures effectively. This change increases the reliability of the code.
    7
    Improve error handling when converting the response body to a string ___ **Ensure that the text method call on the response is properly handled to avoid
    potential runtime errors if the response body cannot be converted to a string.** [rs/core/corpora_client/src/apis/workon_api.rs [51]](https://github.com/skyl/corpora/pull/44/files#diff-9b28c8c6fe3bf21371bbb9a1d6208645e15c9047372bcc63088dbb0945057615R51-R51) ```diff -let local_var_content = local_var_resp.text()?; +let local_var_content = local_var_resp.text().map_err(Error::from)?; ```
    Suggestion importance[1-10]: 7 Why: The suggestion enhances error handling by mapping potential errors when converting the response body to a string, which helps prevent runtime errors. This change improves the robustness and reliability of the code.
    7
    Best practice
    Replace the unimplemented! macro with a more informative error handling mechanism ___ **Consider handling the unimplemented! macro more gracefully by returning a specific
    error or logging a message, as this will provide more context if the function is
    called unexpectedly.** [rs/core/corpora_client/src/apis/mod.rs [92]](https://github.com/skyl/corpora/pull/44/files#diff-825889a501854be7279fed5dbe279b3a6b7b20307d9f96664ffaf28da31a6460R92-R92) ```diff -unimplemented!("Only objects are supported with style=deepObject") +Err(Error::ResponseError(ResponseContent { + status: reqwest::StatusCode::NOT_IMPLEMENTED, + content: String::from("Only objects are supported with style=deepObject"), + entity: None, +})) ```
    Suggestion importance[1-10]: 8 Why: The suggestion replaces the `unimplemented!` macro with a more informative error handling mechanism, which improves the robustness of the code by providing a specific error response. This change enhances the maintainability and debuggability of the code.
    8
    Security
    Use a secure default value for base_path to enhance security ___ **Consider using a more secure default value for base_path in the Configuration struct
    to avoid potential security issues.** [rs/core/corpora_client/src/apis/configuration.rs [39]](https://github.com/skyl/corpora/pull/44/files#diff-817a9d0e9368f1847dc5d15fecefddda75ff5308c7cce98729554933648c799fR39-R39) ```diff -base_path: "http://localhost".to_owned(), +base_path: "https://localhost".to_owned(), ```
    Suggestion importance[1-10]: 7 Why: The suggestion improves security by recommending the use of HTTPS instead of HTTP as a default value. This change enhances the security posture of the application by defaulting to a secure protocol.
    7
    Performance
    Inline the URL encoding to reduce function call overhead ___ **Optimize the urlencode function usage by directly using the
    url::form_urlencoded::byte_serialize method inline, reducing function call overhead.** [rs/core/corpora_client/src/apis/split_api.rs [49]](https://github.com/skyl/corpora/pull/44/files#diff-b2d082d9b15eff04b087d4114e4bc4e5262c3fe7993702ab620517faec191227R49-R49) ```diff -split_id = crate::apis::urlencode(split_id) +split_id = ::url::form_urlencoded::byte_serialize(split_id.as_bytes()).collect::() ```
    Suggestion importance[1-10]: 5 Why: The suggestion inlines the URL encoding process, which may slightly improve performance by reducing function call overhead. However, the performance gain is likely minimal, so the impact is moderate.
    5

    Previous suggestions

    Suggestions up to commit a3f154f
    CategorySuggestion                                                                                                                                    Score
    Possible issue
    Add error handling for request execution to manage potential failures ___ **Consider adding error handling for the req.execute(self.configuration.borrow())
    calls to manage potential request failures or network issues.** [rs/core/corpora_client/src/apis/split_api.rs [71]](https://github.com/skyl/corpora/pull/44/files#diff-b2d082d9b15eff04b087d4114e4bc4e5262c3fe7993702ab620517faec191227R71-R71) ```diff -req.execute(self.configuration.borrow()) +req.execute(self.configuration.borrow()).map_err(|e| Error::from(e)) ```
    Suggestion importance[1-10]: 8 Why: Adding error handling for request execution is crucial for managing potential failures, improving the robustness and reliability of the API client.
    8
    Avoid shadowing the imported base_url by using a different variable name for the local assignment ___ **Ensure that base_url is not unintentionally shadowed by the local assignment in the
    constructor, which might lead to unexpected behavior.** [py/packages/corpora_cli/auth.py [18-19]](https://github.com/skyl/corpora/pull/44/files#diff-4d6ab61ebfe8649e2a5f322f7de82b2ef4efcda2685d0297d4e30b2cc8cd62ebR18-R19) ```diff -base_url = config.get("base_url", "http://app:8877") -self.token_url = f"{base_url}/o/token/" +self.base_url = config.get("base_url", "http://app:8877") +self.token_url = f"{self.base_url}/o/token/" ```
    Suggestion importance[1-10]: 7 Why: The suggestion prevents potential issues caused by variable shadowing, which could lead to unexpected behavior. It improves code clarity and correctness by ensuring that the `base_url` variable is not confused with the imported module.
    7
    Validate the limit field to prevent excessive resource usage ___ **Validate the limit field to ensure it is within a reasonable range to prevent
    excessive resource usage.** [rs/core/corpora_client/src/models/split_vector_search_schema.rs [21]](https://github.com/skyl/corpora/pull/44/files#diff-f89f6e2d0bd585c79b04d082e7163a14ce16585fe75d89907e7e74dcbaa887acR21-R21) ```diff -pub limit: Option, +pub limit: Option, // Ensure limit is within a reasonable range ```
    Suggestion importance[1-10]: 5 Why: Adding validation for the `limit` field can prevent excessive resource usage, enhancing the application's stability, though the suggestion lacks specific implementation details.
    5
    Possible bug
    Safely handle serialization errors in with_body_param to prevent runtime panics ___ **Handle potential errors from serde_json::to_string in with_body_param to prevent
    panics if serialization fails.** [rs/core/corpora_client/src/apis/request.rs [72]](https://github.com/skyl/corpora/pull/44/files#diff-c5ea394eab9224f57c56dab54b866721fb600a3d50380e2c0f2312156ea949f0R72-R72) ```diff -self.serialized_body = Some(serde_json::to_string(¶m).unwrap()); +self.serialized_body = serde_json::to_string(¶m).ok(); ```
    Suggestion importance[1-10]: 8 Why: This suggestion addresses a potential bug by handling serialization errors, preventing runtime panics. It enhances the robustness and reliability of the code, making it a valuable improvement.
    8
    Best practice
    Validate date-time fields to ensure they are in a consistent format ___ **Ensure that the created_at and updated_at fields are validated to be in a consistent
    and expected date-time format.** [rs/core/corpora_client/src/models/corpus_response_schema.rs [28-29]](https://github.com/skyl/corpora/pull/44/files#diff-727aff8a5af46e746300305de2921fb073d1de50a37bef26110a8d7eb5bcc7f9R28-R29) ```diff -pub created_at: String, -pub updated_at: String, +pub created_at: chrono::NaiveDateTime, +pub updated_at: chrono::NaiveDateTime, ```
    Suggestion importance[1-10]: 7 Why: Using a structured date-time type like `chrono::NaiveDateTime` ensures consistency and correctness in handling date-time fields, which is a best practice for data integrity.
    7
    Ensure the tree command is installed before creating an alias for it in the shell configuration ___ **Add a check to ensure that the tree command is installed before creating an alias
    for it, to prevent errors if the command is not available.** [.devcontainer/setup.sh [6]](https://github.com/skyl/corpora/pull/44/files#diff-aaf9e7764a12876cbe70d492c719e21e8590390a38d0f037193a765df858fdbfR6-R6) ```diff -echo "alias tree=\"tree -I '\\.venv|node_modules|build|target|dist|test-corpora|__pycache__|\\.git|\\.pytest_cache' -a\"" >> ~/.zshrc +if command -v tree &> /dev/null; then + echo "alias tree=\"tree -I '\\.venv|node_modules|build|target|dist|test-corpora|__pycache__|\\.git|\\.pytest_cache' -a\"" >> ~/.zshrc +fi ```
    Suggestion importance[1-10]: 5 Why: This suggestion adds a check to ensure the `tree` command is available before creating an alias, which is a good practice to prevent errors. However, it is not critical, as the absence of the command would not cause significant issues.
    5
    Performance
    Use a more efficient data type for large file content ___ **Consider using a more efficient data type for the content field if the file content
    is large, such as a Vec for binary data.** [rs/core/corpora_client/src/models/file_response_schema.rs [22]](https://github.com/skyl/corpora/pull/44/files#diff-bb57ed8e9e2eb3d0a5f321defa1c9c35c6c441524fbf583d97324f417139a5a0R22-R22) ```diff -pub content: String, +pub content: Vec, ```
    Suggestion importance[1-10]: 6 Why: Changing the `content` field to `Vec` can improve performance when handling large binary data, though it requires careful consideration of the data's nature and usage.
    6
    Enhancement
    Use a more descriptive variable name to enhance code readability ___ **Consider using a more descriptive variable name instead of rc to improve code
    readability and maintainability.** [rs/core/corpora_client/src/apis/configuration.rs [20]](https://github.com/skyl/corpora/pull/44/files#diff-817a9d0e9368f1847dc5d15fecefddda75ff5308c7cce98729554933648c799fR20-R20) ```diff -let rc = Arc::new(configuration); +let config_arc = Arc::new(configuration); ```
    Suggestion importance[1-10]: 4 Why: The suggestion improves code readability by using a more descriptive variable name. While it enhances maintainability, the impact is minor as it does not affect functionality or correctness.
    4
    skyl commented 6 days ago

    /describe

    skyl commented 6 days ago

    /review

    github-actions[bot] commented 6 days ago

    Persistent review updated to latest commit https://github.com/skyl/corpora/commit/91b54853dbcb10b2774bdc060ee3231261b80a6d

    github-actions[bot] commented 6 days ago

    PR Description updated to latest commit (https://github.com/skyl/corpora/commit/91b54853dbcb10b2774bdc060ee3231261b80a6d)

    github-actions[bot] commented 6 days ago

    Persistent review updated to latest commit https://github.com/skyl/corpora/commit/91b54853dbcb10b2774bdc060ee3231261b80a6d