skyl / corpora

Corpora is a self-building corpus that can help build other arbitrary corpora
GNU Affero General Public License v3.0
2 stars 0 forks source link

feat(api): chat the corpus + refactor schema #57

Closed skyl closed 3 days ago

skyl commented 3 days ago

PR Type

enhancement, tests


Description


Changes walkthrough ๐Ÿ“

Relevant files
Enhancement
corpus.py
Add chat endpoint to corpus router for interactive communication

py/packages/corpora/routers/corpus.py
  • Added a new endpoint for chatting with the corpus.
  • Introduced new imports for chat schema and LLM provider.
  • Defined a system message for chat context.
  • +64/-2   
    chat.py
    Define schemas for chat functionality in corpora                 

    py/packages/corpora/schema/chat.py
  • Introduced MessageSchema, CorpusChatSchema, and CorpusFileChatSchema.
  • Added a function to get additional context for chat.
  • +50/-0   
    core.py
    Introduce core schemas for corpus and file operations       

    py/packages/corpora/schema/core.py
  • Added core schemas for corpus and file responses.
  • Defined schemas for split vector search.
  • +51/-1   
    corpus_api.py
    Implement chat method in corpus API client                             

    py/packages/corpora_client/api/corpus_api.py
  • Added chat method to interact with the corpus.
  • Implemented serialization and deserialization for chat requests.
  • +253/-0 
    corpus_chat_schema.py
    Rename and update chat schema model for corpus                     

    py/packages/corpora_client/models/corpus_chat_schema.py
  • Renamed IssueRequestSchema to CorpusChatSchema.
  • Updated methods for JSON and dict conversions.
  • +4/-4     
    corpus_api.rs
    Add chat function to corpus API in Rust client                     

    rs/core/corpora_client/src/apis/corpus_api.rs
  • Added a new function for chatting with the corpus.
  • Defined error handling for chat method.
  • +51/-0   
    Tests
    test_corpus_api.py
    Add test case stub for chat method in corpus API                 

    py/packages/corpora_client/test/test_corpus_api.py - Added a test case stub for the chat method.
    +7/-0     
    Configuration changes
    genall.sh
    Add script for generating files in Python and Rust             

    genall.sh
  • Added a script to generate all necessary files for Python and Rust.
  • +10/-0   

    ๐Ÿ’ก PR-Agent usage: Comment /help "your question" on any pull request to receive relevant information

    github-actions[bot] commented 3 days ago

    PR Reviewer Guide ๐Ÿ”

    Here are some key observations to aid the review process:

    โฑ๏ธ Estimated effort to review: 4 ๐Ÿ”ต๐Ÿ”ต๐Ÿ”ต๐Ÿ”ตโšช
    ๐Ÿงช PR contains tests
    ๐Ÿ”’ No security concerns identified
    โšก Recommended focus areas for review

    Logging
    The `print` statement on line 149 should be replaced with a proper logging mechanism to ensure consistent logging practices and to avoid printing directly to standard output in a production environment. Token Limit Handling
    The comment on lines 140-144 suggests a need to handle token count limits and conversation summarization. This should be addressed to prevent potential issues with large payloads.
    github-actions[bot] commented 3 days ago

    PR Code Suggestions โœจ

    Explore these optional code suggestions:

    CategorySuggestion                                                                                                                                    Score
    Possible issue
    Add error handling for the split_context retrieval to prevent unexpected failures ___ **Ensure that the split_context variable is properly handled in case the
    get_relevant_splits_context method returns an unexpected result or raises an
    exception.** [py/packages/corpora/routers/corpus.py [145-147]](https://github.com/skyl/corpora/pull/57/files#diff-3ac4611171c3ba2b1e471a17f85e949a5df6e835391d39367db05286216d1762R145-R147) ```diff -split_context = await sync_to_async(corpus.get_relevant_splits_context)( - "\n".join(message.text for message in payload.messages[-2:]) -) +try: + split_context = await sync_to_async(corpus.get_relevant_splits_context)( + "\n".join(message.text for message in payload.messages[-2:]) + ) +except SomeException as e: + handle_exception(e) + split_context = "default context" ```
    Suggestion importance[1-10]: 8 Why: The suggestion to add error handling for the `split_context` retrieval is valuable as it addresses potential issues with unexpected results or exceptions from the `get_relevant_splits_context` method. This enhances the robustness and reliability of the code by preventing unexpected failures during execution.
    8
    Confirm parameter compatibility for method calls to ensure correct functionality ___ **Verify that corpus_chat_schema is the correct parameter for the get_issue method, as
    it was previously issue_request_schema, to ensure compatibility and correctness.** [py/packages/corpora_client/docs/PlanApi.md [46-50]](https://github.com/skyl/corpora/pull/57/files#diff-f788e69787402d9ae6eece8438d7d789f62385315f649488ad2e63bdf23350bdR46-R50) ```diff -corpus_chat_schema = corpora_client.CorpusChatSchema() # CorpusChatSchema | +issue_request_schema = corpora_client.IssueRequestSchema() # IssueRequestSchema | ... -api_response = api_instance.get_issue(corpus_chat_schema) +api_response = api_instance.get_issue(issue_request_schema) ```
    Suggestion importance[1-10]: 8 Why: The suggestion highlights a critical check to ensure that the correct parameter is used for the `get_issue` method. This is important for maintaining compatibility and correctness, especially since the parameter type was changed in the PR.
    8
    Initialize objects with necessary data before method calls to prevent runtime errors ___ **Ensure that the corpus_chat_schema object is properly initialized with necessary
    data before calling the chat method to avoid potential runtime errors.** [py/packages/corpora_client/README.md [79-83]](https://github.com/skyl/corpora/pull/57/files#diff-9ed923da2950d71841f1f37b2d40f4a36f97643423c22c32f568f3f9860558fbR79-R83) ```diff -corpus_chat_schema = corpora_client.CorpusChatSchema() # CorpusChatSchema | +corpus_chat_schema = corpora_client.CorpusChatSchema(data) # CorpusChatSchema | ... api_response = api_instance.chat(corpus_chat_schema) ```
    Suggestion importance[1-10]: 7 Why: The suggestion addresses a potential runtime error by ensuring that the `corpus_chat_schema` object is initialized with necessary data before being used in the `chat` method. This is a valid and useful improvement to prevent errors during execution.
    7