tensorlakeai / indexify

A realtime serving engine for Data-Intensive Generative AI Applications
https://docs.tensorlake.ai
Apache License 2.0
901 stars 112 forks source link

Add support for Json Serialization #984

Open stangirala opened 2 days ago

stangirala commented 2 days ago

Problem

Right now indexify supports Cloud Pickle and Message Pack for Serialization. This means that the data written to the blob store will saved as binaries and can only be retrieved as usable/human readable data using the Python SDK (Indexify Client). We already have the code and abstraction setup to swap the serializer. This code is only in the Python SDK.

We would need to modify get_serializer in object_serializer.py to make this happen.

User DX

The DX after making this change would be,

@indexify_function(payload_encoder="json")
def simple_function(x: MyObject) -> MyObject:
    return MyObject(x=x.x + "b")

Testing

  1. Test graph where payload encoder is not specified anywhere (make sure it uses default). Test the outputs (ie. use an exiting graph and check that the tests passes).
  2. Test graph where one function has the json encoder. Verify that functions outputs are retrievable using the curl endpoint and are in json.
  3. All functions in the graph are specified to use the json encoder. Verify that functions outputs are retrievable using the curl endpoint and are in json.
Default2882 commented 2 days ago

Hi, can you please assign this to me? I'd like to work on it, thanks!

Default2882 commented 13 hours ago

While writing the unit tests i discovered that in the deserialize_input method of IndexifyFunctionWrapper we have some custom logic for msgpack, should I move all this logic to MsgPackSerializer?