Implement a Second LLM Provider to Validate Interface Abstraction

Problem:

Currently, the corpora_ai module only has an implementation for the OpenAI backend (i.e., corpora_ai_openai). Without a second provider, it's difficult to test if the corpora_ai interface is truly provider-agnostic and not too tightly coupled with OpenAI-specific implementations.

Goal:

Integrate a second Large Language Model (LLM) provider to demonstrate and validate the abstraction layer in corpora_ai. Possible alternatives include:

Anthropic/Caude: These can be considered if they meet the operational requirements and are accessible.
Self-Hosted Solution: Utilizing models available from Hugging Face that can be hosted locally to ensure variety in LLM providers without dependency on external APIs.

Requirements:

Develop a module like corpora_ai_openai, for a new provider.
Ensure the new provider follows the existing standard API defined by corpora_ai.
Modify provider_loader.py to support dynamic loading of the new provider based on an environment variable.
Create unit tests for the new provider similar to the test_openai_provider_success in test_provider_loader.py.

Benefits:

Ensure the corpora_ai abstraction layer is sufficiently generic to allow plug-and-play functionality for different LLM providers.
Identify and correct any assumptions or bias towards the OpenAI implementation in the current interface.

Further Considerations:

Evaluate the ease of integration and documentation of potential providers.
Consider performance implications and evaluate any specific API limitations or rate limits.
Explore licensing or community support for the chosen provider to ensure long-term viability.

By completing this task, we will have a better-designed interface that is extensible and adaptable for multiple LLM providers, enhancing the versatility of corpora_ai. This will facilitate easier integration of new technologies as they become available, without extensive refactoring.

Avoid reliance solely on OpenAI's API to ensure a versatile infrastructure that anticipates scalability and integration across diverse platforms.

skyl / corpora