skyl / corpora

Corpora is a self-building corpus that can help build other arbitrary corpora
GNU Affero General Public License v3.0
2 stars 0 forks source link

feat(chat): hacking on some RAG chat, playing with TUI #61

Closed skyl closed 1 day ago

skyl commented 1 day ago

PR Type

enhancement, tests


Description


Changes walkthrough πŸ“

Relevant files
Enhancement
9 files
models.py
Comment out synthetic embedding text code                               

py/packages/corpora/models.py
  • Commented out code related to synthetic embedding text.
  • Retained existing embedding functionality.
  • +3/-0     
    corpus.py
    Update system message and import in corpus router               

    py/packages/corpora/routers/corpus.py
  • Imported CHAT_SYSTEM_MESSAGE.
  • Updated system message text in chat function.
  • +2/-4     
    sync.py
    Comment out summary task generation                                           

    py/packages/corpora/tasks/sync.py - Commented out the call to `generate_summary_task`.
    +1/-1     
    llm_interface.py
    Add synthetic embedding text method                                           

    py/packages/corpora_ai/llm_interface.py
  • Added method get_synthetic_embedding_text.
  • Imported SYNTHETIC_EMBEDDING_SYSTEM_MESSAGE.
  • +22/-1   
    prompts.py
    Add new system messages for chat and embedding                     

    py/packages/corpora_ai/prompts.py
  • Added SYNTHETIC_EMBEDDING_SYSTEM_MESSAGE.
  • Added CHAT_SYSTEM_MESSAGE.
  • +15/-0   
    chat-tui.rs
    Implement chat TUI with event handling                                     

    rs/core/corpora_cli/src/commands/chat-tui.rs
  • Implemented a TUI for chat using ratatui.
  • Added event handling for user input.
  • +166/-0 
    init.rs
    Calculate and display tarball size in MB                                 

    rs/core/corpora_cli/src/commands/init.rs - Added calculation for tarball size in MB.
    +2/-2     
    git.rs
    Add exclude globs and debug prints in GitCollector             

    rs/core/corpora_cli/src/context/collector/git.rs
  • Added support for exclude globs in GitCollector.
  • Added debug print statements for path collection.
  • +31/-6   
    config.rs
    Support exclude globs and add debug prints in config         

    rs/core/corpora_cli/src/context/config.rs
  • Added support for exclude globs in configuration.
  • Added debug print statements for configuration loading.
  • +6/-5     
    Tests
    1 files
    test_sync.py
    Comment out summary task assertion in tests                           

    py/packages/corpora/tasks/test_sync.py - Commented out assertion for summary task call.
    +2/-1     
    Documentation
    2 files
    provider_loader.py
    Add comments for model specification in provider loader   

    py/packages/corpora_ai/provider_loader.py - Added comments for model specification.
    +3/-0     
    llm_client.py
    Add comments for runtime model specification in OpenAI client

    py/packages/corpora_ai_openai/llm_client.py - Added comments for runtime model specification.
    +6/-2     
    Miscellaneous
    1 files
    chat.rs
    Comment out debug print statements in chat command             

    rs/core/corpora_cli/src/commands/chat.rs - Commented out debug print statements.
    +4/-0     
    Configuration changes
    1 files
    docker-compose.yaml
    Add Azure endpoint to environment variables                           

    docker-compose.yaml - Added `OPENAI_AZURE_ENDPOINT` to environment variables.
    +2/-0     
    Dependencies
    1 files
    Cargo.toml
    Add new dependencies for TUI and glob handling                     

    rs/core/corpora_cli/Cargo.toml
  • Added new dependencies: crossterm, globset, ratatui, tui-textarea.
  • +4/-0     

    πŸ’‘ PR-Agent usage: Comment /help "your question" on any pull request to receive relevant information

    github-actions[bot] commented 1 day ago

    PR Reviewer Guide πŸ”

    (Review updated until commit https://github.com/skyl/corpora/commit/0ed1c2958f7e004af97a79a8d38fd15929b70d68)

    Here are some key observations to aid the review process:

    ⏱️ Estimated effort to review: 4 πŸ”΅πŸ”΅πŸ”΅πŸ”΅βšͺ
    πŸ§ͺ No relevant tests
    πŸ”’ No security concerns identified
    ⚑ Recommended focus areas for review

    Code Smell
    Commented-out code related to synthetic embedding is present. Consider removing it if it's not needed or adding a comment explaining why it's kept. Code Smell
    The function `create_markdown_skin` has commented-out code that seems to be related to styling. Consider removing or uncommenting it if needed. Logging
    Debug print statements are present in the `collect_paths` and `collect_tarball` methods. Consider using a logging framework for better control over log levels and outputs. Logging
    Debug print statements are present in the `load_config` function. Consider using a logging framework for better control over log levels and outputs.
    github-actions[bot] commented 1 day ago

    PR Code Suggestions ✨

    Latest suggestions up to 0ed1c29 Explore these optional code suggestions:

    CategorySuggestion                                                                                                                                    Score
    Possible issue
    Add error handling for file reading to prevent application crashes due to missing or unreadable files ___ **Handle potential errors when reading files for voice, purpose, and structure to
    prevent the application from panicking if the files are missing or unreadable.** [rs/core/corpora_cli/src/commands/chat-tui.rs [133-138]](https://github.com/skyl/corpora/pull/61/files#diff-474e15662a9be1654edd640ae339c55a0157a4e04788419219bb68f9a9f7a218R133-R138) ```diff -let voice = std::fs::read_to_string(root_path.join(".corpora/VOICE.md")).unwrap_or_default(); -let purpose = std::fs::read_to_string(root_path.join(".corpora/PURPOSE.md")).unwrap_or_default(); -let structure = std::fs::read_to_string(root_path.join(".corpora/STRUCTURE.md")).unwrap_or_default(); +let voice = std::fs::read_to_string(root_path.join(".corpora/VOICE.md")).unwrap_or_else(|_| String::from("Default voice")); +let purpose = std::fs::read_to_string(root_path.join(".corpora/PURPOSE.md")).unwrap_or_else(|_| String::from("Default purpose")); +let structure = std::fs::read_to_string(root_path.join(".corpora/STRUCTURE.md")).unwrap_or_else(|_| String::from("Default structure")); ```
    Suggestion importance[1-10]: 7 Why: This suggestion improves the robustness of the application by adding error handling for file reading, which can prevent crashes due to missing or unreadable files. It enhances the application's reliability, making it a valuable improvement.
    7
    Handle errors in strip_prefix to prevent panics when paths do not match the expected prefix ___ **Ensure that the strip_prefix method handles potential errors to avoid panics when
    paths do not match the expected prefix.** [rs/core/corpora_cli/src/context/collector/git.rs [68-72]](https://github.com/skyl/corpora/pull/61/files#diff-027ebbb115dca33871bfbdaa101bc9b70ba5734f7b77a9cb7022c02a83281a11R68-R72) ```diff .filter(|path| { - Self::is_text_file(path) && !self.exclude_globs.is_match(path.strip_prefix(&self.root_path).unwrap_or(path)) + Self::is_text_file(path) && !self.exclude_globs.is_match(path.strip_prefix(&self.root_path).unwrap_or_else(|_| path.clone())) }) ```
    Suggestion importance[1-10]: 7 Why: The suggestion to handle errors in `strip_prefix` is important for preventing potential panics, which can improve the stability and reliability of the code. It addresses a possible issue, making it a beneficial enhancement.
    7
    General
    Remove or properly integrate commented-out code for generating synthetic embeddings to avoid confusion ___ **Ensure that the commented-out code for generating synthetic embeddings is either
    removed or properly integrated, as leaving it commented can lead to confusion and
    maintenance issues.** [py/packages/corpora/models.py [60-63]](https://github.com/skyl/corpora/pull/61/files#diff-c9de5374aa987d761f770eb99739e686761246600fc6cc0155902d79c9aa3ea5R60-R63) ```diff -# better_text = llm.get_synthetic_embedding_text(text) -# print(f"better_text: {better_text}") -# vector = llm.get_embedding(better_text) vector = llm.get_embedding(text) ```
    Suggestion importance[1-10]: 5 Why: The suggestion to remove or properly integrate commented-out code is valid as it can help reduce confusion and improve code maintainability. However, it does not address a critical issue, so the impact is moderate.
    5

    Previous suggestions

    Suggestions up to commit f9ce11b
    CategorySuggestion                                                                                                                                    Score
    General
    Improve error handling by logging errors instead of using unwrap_or_else ___ **Handle potential errors from process_request by logging or managing them instead of
    using unwrap_or_else, which may hide underlying issues.** [rs/core/corpora_cli/src/commands/chat-tui.rs [91]](https://github.com/skyl/corpora/pull/61/files#diff-474e15662a9be1654edd640ae339c55a0157a4e04788419219bb68f9a9f7a218R91-R91) ```diff -let ai_response = response.unwrap_or_else(|e| format!("Error: {}", e)); +let ai_response = match response { + Ok(res) => res, + Err(e) => { + log::error!("Failed to process request: {}", e); + format!("Error: {}", e) + } +}; ```
    Suggestion importance[1-10]: 9 Why: Logging errors provides better visibility into issues and aids in debugging, making it a more robust approach than simply formatting errors, which can obscure underlying problems.
    9
    Remove debugging print statements to clean up console output ___ **Remove or replace the print statement used for debugging purposes to avoid
    unnecessary console output in production.** [py/packages/corpora/models.py [61]](https://github.com/skyl/corpora/pull/61/files#diff-c9de5374aa987d761f770eb99739e686761246600fc6cc0155902d79c9aa3ea5R61-R61) ```diff -print(f"better_text: {better_text}") +# print(f"better_text: {better_text}") ```
    Suggestion importance[1-10]: 7 Why: Removing or commenting out debugging print statements is a good practice to prevent unnecessary console output in production, improving code cleanliness and performance.
    7
    Possible issue
    Ensure terminal state is restored even if an error occurs during execution ___ **Ensure that the terminal state is properly restored even if an error occurs by using
    a defer or similar mechanism to handle cleanup.** [rs/core/corpora_cli/src/commands/chat-tui.rs [110-112]](https://github.com/skyl/corpora/pull/61/files#diff-474e15662a9be1654edd640ae339c55a0157a4e04788419219bb68f9a9f7a218R110-R112) ```diff -disable_raw_mode()?; -execute!(terminal.backend_mut(), terminal::LeaveAlternateScreen)?; -terminal.show_cursor()?; +// Use a defer mechanism to ensure cleanup +defer! { + disable_raw_mode().unwrap(); + execute!(terminal.backend_mut(), terminal::LeaveAlternateScreen).unwrap(); + terminal.show_cursor().unwrap(); +} ```
    Suggestion importance[1-10]: 8 Why: Ensuring terminal state restoration even in the event of an error is crucial for maintaining a consistent user experience and preventing terminal corruption, making this a valuable improvement.
    8
    github-actions[bot] commented 1 day ago

    Persistent review updated to latest commit https://github.com/skyl/corpora/commit/0ed1c2958f7e004af97a79a8d38fd15929b70d68