tensorlakeai / indexify

A realtime and indexing and structured extraction engine for Unstructured Data to build Generative AI Applications
https://docs.getindexify.ai
Apache License 2.0
859 stars 98 forks source link

Versioning for Graphs and Functions in Python SDK #895

Open PulkitMishra opened 12 hours ago

PulkitMishra commented 12 hours ago

Introduce Versioning for Graphs and Functions

Issue Description

Currently, the Indexify Python SDK lacks a robust versioning system for graphs and functions. This makes it challenging to manage changes over time, track the evolution of workflows, and ensure reproducibility of results. Implementing a versioning system will significantly improve the maintainability and reliability of Indexify workflows.

Current Limitations

  1. In indexify/functions_sdk/graph.py, the Graph class doesn't have any version information:
class Graph:
    def __init__(
        self, name: str, start_node: IndexifyFunction, description: Optional[str] = None
    ):
        self.name = name
        self.description = description
        self.nodes: Dict[str, Union[IndexifyFunction, IndexifyRouter]] = {}
        # ...
  1. The indexify_function decorator in indexify/functions_sdk/indexify_functions.py doesn't include version information:
def indexify_function(
    name: Optional[str] = None,
    description: Optional[str] = "",
    image: Optional[Image] = DEFAULT_IMAGE,
    accumulate: Optional[Type[BaseModel]] = None,
    payload_encoder: Optional[str] = "cloudpickle",
    placement_constraints: List[PlacementConstraints] = [],
):
    # ...
  1. When registering a compute graph in indexify/remote_client.py, there's no version handling:
def register_compute_graph(self, graph: Graph):
    graph_metadata = graph.definition()
    serialized_code = graph.serialize()
    response = self._post(
        f"namespaces/{self.namespace}/compute_graphs",
        files={"code": serialized_code},
        data={"compute_graph": graph_metadata.model_dump_json(exclude_none=True)},
    )
    # ...

Benefits of Versioning

  1. Reproducibility: Ensure that workflows can be reproduced exactly, even as individual functions or the overall graph structure evolves.
  2. Change Tracking: Easily track changes to functions and graphs over time, facilitating debugging and auditing.
  3. Collaboration: Enable multiple team members to work on the same workflow without conflicts.
  4. Rollback Capability: Quickly revert to previous versions of functions or entire graphs if issues are discovered.
  5. A/B Testing: Compare different versions of workflows or functions to optimize performance.

Proposed Solution

  1. Add version information to the Graph class
  2. Modify the indexify_function decorator to include version information
  3. Update the register_compute_graph method to handle versioning
  4. Implement version comparison and management utilities
  5. Update the LocalClient and RemoteClient classes to support versioning operations
  6. Modify the Task class in indexify/executor/api_objects.py to include version information:
  7. Update all relevant tests to include version checks
diptanu commented 12 hours ago

Versioning is done automatically when code changes by the server. Take a look at https://github.com/tensorlakeai/indexify/blob/main/python-sdk/tests/test_graph_update.py