Open pdhoolia opened 6 days ago
To implement a localization strategy based on semantic vector search, we need to create a new class that extends LocalizationStrategy
and utilizes Langchain's abstractions for embeddings, documents, and vector stores. We'll also need to integrate Milvus for storing and querying the vector representations of code files.
Here's a step-by-step outline with code snippets for the key changes:
Extend LocalizationStrategy
with Semantic Vector Search:
Create a new class SemanticVectorSearchLocalization
in localization_strategy.py
:
from langchain_core.vectorstores import VectorStore
from langchain_core.embeddings import Embeddings
from langchain_core.documents import Document
from langchain_milvus import Milvus
class SemanticVectorSearchLocalization(LocalizationStrategy):
def __init__(self, project_path: str, embedding_model: Embeddings, milvus_uri: str):
self.project_path = project_path
self.embedding_model = embedding_model
self.vector_store = Milvus(embedding_model=embedding_model, uri=milvus_uri)
self._load_or_create_vector_store()
def _load_or_create_vector_store(self):
try:
self.vector_store.load(self._get_vector_store_path())
except FileNotFoundError:
self.vector_store.create(self._get_vector_store_path())
def _get_vector_store_path(self):
return os.path.join(self.project_path, 'metadata', 'vector_store')
def localize(self, issue: Dict[str, str], top_n: int) -> List[str]:
query_embedding = self.embedding_model.embed(issue['description'])
results = self.vector_store.similarity_search(query_embedding, k=top_n)
return [result.metadata['file_path'] for result in results]
Add an Embedding Task to LLM Model Configurations:
Update TaskName
in model_configuration_manager.py
:
class TaskName(Enum):
GENERATE_CODE_SUMMARY = "generate_code_summary"
GENERATE_PACKAGE_SUMMARY = "generate_package_summary"
GENERATE_REPO_SUMMARY = "generate_repo_summary"
LOCALIZE = "localize"
GENERATE_SUGGESTIONS = "generate_suggestions"
EMBEDDING = "embedding" # New task for embeddings
Expose a Factory Method for Fetching Embeddings in api.py
:
Add a function to get embeddings model based on the provider:
from langchain_openai import OpenAIEmbeddings
from langchain_ollama import OllamaEmbeddings
from langchain_huggingface import HuggingFaceEmbeddings
def fetch_embedding_model() -> Embeddings:
task_config = config.get_task_config(PROVIDER, TaskName.EMBEDDING)
model_name = task_config.model_name
if PROVIDER == "openai":
return OpenAIEmbeddings(model=model_name)
elif PROVIDER == "ollama":
return OllamaEmbeddings(model=model_name)
else:
return HuggingFaceEmbeddings(model=model_name)
Usage Example:
When creating an instance of SemanticVectorSearchLocalization
, use the factory method to fetch the embedding model:
embedding_model = fetch_embedding_model()
localization_strategy = SemanticVectorSearchLocalization(
project_path="/path/to/project",
embedding_model=embedding_model,
milvus_uri="http://localhost:19530"
)
Vector Store Management:
Implement logic in the SemanticVectorSearchLocalization
to handle vector store creation during project onboarding and incremental updates for existing documents.
This code structure sets up a semantic vector search-based localization strategy using Langchain abstractions and Milvus, while allowing for flexible embedding model selection based on the provider configuration.
Let's add another implementation for
LocalizationStrategy
based on semantic vector search.Implementation plan details:
Langchain: Use
langchain
abstractions for building this. E.g.,from langchain_core.embeddings import Embeddings
as the abstraction for embeddings.from langchain_core.documents import Document
as the abstraction for the document to be indexedfrom langchain_core.vectorstores import VectorStore
as the abstraction for the vector storeMilvus: Use
from langchain_milvus import Milvus
as the vector storeEmbedding model:
embedding
task to LLM model configurations. Each provider may supply amodel_name
for embeddings.openai
as the provider, the factory should returnfrom langchain_openai import OpenAIEmbeddings
ollama
as the provider, the factory should returnfrom langchain_ollama import OllamaEmbeddings
from langchain_huggingface import HuggingFaceEmbeddings
Vector store creation and storage:
metadata
folderSemantic vector search strategy:
LocalizationStrategy
vector store
fromproject
to get localization results