skyl / corpora

Corpora is a self-building corpus that can help build other arbitrary corpora
GNU Affero General Public License v3.0
2 stars 0 forks source link

feat(ai): start corpora_ai abstract interface and corpora_ai_openai implementation #14

Closed skyl closed 2 weeks ago

skyl commented 2 weeks ago

PR Type

Enhancement, Tests, Documentation


Description


Changes walkthrough ๐Ÿ“

Relevant files
Enhancement
3 files
llm_interface.py
Define abstract interface for LLM providers                           

py/packages/corpora_ai/llm_interface.py
  • Introduced ChatCompletionTextMessage dataclass for message
    representation.
  • Defined LLMBaseInterface abstract class for LLM providers.
  • Added abstract methods for text completion and embedding generation.
  • +45/-0   
    provider_loader.py
    Implement dynamic LLM provider loading mechanism                 

    py/packages/corpora_ai/provider_loader.py
  • Implemented dynamic loading of LLM providers.
  • Included OpenAI client with environment variable checks.
  • Added error handling for missing API keys or unsupported providers.
  • +34/-0   
    llm_client.py
    Implement OpenAI client for LLM interactions                         

    py/packages/corpora_ai_openai/llm_client.py
  • Implemented OpenAIClient class for OpenAI API interaction.
  • Provided methods for text completion and embedding generation.
  • Included error handling for empty inputs.
  • +32/-0   
    Tests
    2 files
    test_provider_loader.py
    Add unit tests for LLM provider loader                                     

    py/packages/corpora_ai/test_provider_loader.py
  • Added unit tests for load_llm_provider function.
  • Tested scenarios for successful loading, missing API keys, and invalid
    providers.
  • +65/-0   
    test_llm_client.py
    Add unit tests for OpenAI client                                                 

    py/packages/corpora_ai_openai/test_llm_client.py
  • Added unit tests for OpenAIClient methods.
  • Tested text completion and embedding generation.
  • Included tests for error handling on empty inputs.
  • +75/-0   
    Documentation
    3 files
    README.md
    Document corpora_ai abstraction and usage                               

    py/packages/corpora_ai/README.md
  • Documented corpora_ai abstraction layer and usage.
  • Explained provider loading and API usage for text completion and
    embedding.
  • +41/-0   
    README.md
    Document OpenAI implementation and usage                                 

    py/packages/corpora_ai_openai/README.md
  • Documented corpora_ai_openai features and usage.
  • Provided instructions for initializing and using OpenAI client.
  • +44/-0   
    about-structure.md
    Update directory structure documentation                                 

    md/prompts/corpora/about-structure.md - Updated directory structure documentation.
    +17/-127
    Dependencies
    1 files
    requirements.txt
    Add OpenAI package dependency                                                       

    py/packages/corpora_ai_openai/requirements.txt - Added OpenAI Python package dependency.
    +1/-0     

    ๐Ÿ’ก PR-Agent usage: Comment /help "your question" on any pull request to receive relevant information

    github-actions[bot] commented 2 weeks ago

    PR Reviewer Guide ๐Ÿ”

    Here are some key observations to aid the review process:

    โฑ๏ธ Estimated effort to review: 3 ๐Ÿ”ต๐Ÿ”ต๐Ÿ”ตโšชโšช
    ๐Ÿงช PR contains tests
    ๐Ÿ”’ No security concerns identified
    โšก Recommended focus areas for review

    Error Handling
    The `load_llm_provider` function raises a `ValueError` if the `OPENAI_API_KEY` is not set or if no valid LLM provider is found. Consider whether this is the best way to handle these errors or if a more user-friendly error message or logging might be beneficial. Input Validation
    The `OpenAIClient` class raises a `ValueError` for empty input in `get_text_completion` and `generate_embedding` methods. Ensure that this is the desired behavior and consider if additional input validation is necessary.
    github-actions[bot] commented 2 weeks ago

    PR Code Suggestions โœจ

    Explore these optional code suggestions:

    CategorySuggestion                                                                                                                                    Score
    Best practice
    Implement error handling for API calls to manage exceptions effectively ___ **Add error handling for the OpenAI API calls to manage potential exceptions such as
    network issues or invalid API responses.** [py/packages/corpora_ai_openai/llm_client.py [23-25]](https://github.com/skyl/corpora/pull/14/files#diff-de8da8414122015059375c328610dcc9a2d9550504ca03bfaa97ec5eed468407R23-R25) ```diff -response = self.client.chat.completions.create( - model=self.completion_model, messages=message_dicts -) +try: + response = self.client.chat.completions.create( + model=self.completion_model, messages=message_dicts + ) +except Exception as e: + raise RuntimeError("Failed to get text completion from OpenAI API") from e ```
    Suggestion importance[1-10]: 8 Why: Adding error handling for API calls is crucial for robustness, as it prevents the application from crashing due to network issues or invalid responses. This suggestion significantly enhances the reliability of the code.
    8
    Possible issue
    Add validation for the API key to prevent initialization with an empty value ___ **Validate the api_key parameter in the OpenAIClient constructor to ensure it is not
    empty or invalid.** [py/packages/corpora_ai_openai/llm_client.py [14]](https://github.com/skyl/corpora/pull/14/files#diff-de8da8414122015059375c328610dcc9a2d9550504ca03bfaa97ec5eed468407R14-R14) ```diff +if not api_key: + raise ValueError("API key must not be empty.") self.client = OpenAI(api_key=api_key) ```
    Suggestion importance[1-10]: 7 Why: Validating the API key ensures that the client is not initialized with an invalid or empty key, which is essential for preventing runtime errors and ensuring proper API usage.
    7
    Possible bug
    Verify the length of the embedding vector to ensure it matches expected dimensions ___ **Ensure that the generate_embedding method checks the length of the returned
    embedding vector to confirm it meets expected dimensions.** [py/packages/corpora_ai_openai/llm_client.py [31]](https://github.com/skyl/corpora/pull/14/files#diff-de8da8414122015059375c328610dcc9a2d9550504ca03bfaa97ec5eed468407R31-R31) ```diff -return response.data[0].embedding +embedding = response.data[0].embedding +if len(embedding) != expected_length: + raise ValueError("Unexpected embedding vector length.") +return embedding ```
    Suggestion importance[1-10]: 6 Why: Checking the length of the embedding vector can help catch unexpected API behavior or changes in the model's output, thus maintaining the integrity of the data processing pipeline. However, the suggestion lacks context on what the expected length should be, which limits its immediate applicability.
    6