tensorlakeai / indexify

A realtime serving engine for Data-Intensive Generative AI Applications
https://docs.getindexify.ai
Apache License 2.0
862 stars 99 forks source link

api: adding serialization for filters #694

Closed grampelberg closed 3 months ago

grampelberg commented 3 months ago

The HashMap<> for filters was getting serialized as a JSON object which couldn't be deserialized by the clients. Move to using with for symmetric handling for filters and handle Option<> gracefully.

vercel[bot] commented 3 months ago

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
indexify ✅ Ready (Inspect) Visit Preview 💬 Add feedback Jun 26, 2024 10:04pm
grampelberg commented 3 months ago

Oh, I see now ... that is one complex field. The problem that I was running into is that serde_json::Value.to_string() properly converts into json (resulting in key:"my_value" as a filter). According to the filters, the values can only have a couple special characters. You wouldn't happen to have some non-string Values that I can look at to get my head around how this should work?

Reading through the serde_json::value::Value docs, it looks like I need to support Null/Bool/Number? Any Array or Object would be immediately invalid.

maxkozlovsky commented 3 months ago

The new code probably wouldn't work either, the serialized value can't be deserialized into the same value. There will be a problem with quoting.

It looks like the deserialization code that looks for '.' and ':' and splits the values is superfluous, we can simply use json parser to input a json string representation of dictionary and produce Json::Value::Object which is a Map<String, Json::Value>. This will ensure we can encode it back into same json string, and it is compatible with the internal representation, and supports any type of values.

grampelberg commented 3 months ago

I'm not sure I understand, how will we convert from HashMap<String, Value> into foo:bar,bar:baz?

maxkozlovsky commented 3 months ago

Is there some specific filter that causes the problem? I'm looking at it again, it's json, clients should be able to parse the values. I tried creating a test policy with filter "key1:5,key2:aaa", it got parsed correctly and python client was able to read back the policy and had correct values for filter { 'key1': 5, 'key2': 'aaa' } as well.

grampelberg commented 3 months ago

I've been writing a rust api client and wanted to use the existing definitions. If you take a look at the response, it is naively serializing the hashmap:

{
...
            "filters_eq": {
              "sports": "foo"
            },
...
}

When I do a serde_json::Deserializer::from_str() on that, it breaks because it is expecting the value to be a string and not a HashMap. The problem is that we're expecting the format of a request to be different than the format of a response where the request is a String and the response is a HashMap.

I assume the python client is working because it is a normal hashmap. My assumption is that it'd actually stop working if we fixed it to be more correct.

maxkozlovsky commented 3 months ago

Oh, I see, the rust client does not work because serializer and de-serializer are not symmetrical for some unholy reason. This change would be breaking all the other clients though which expect regular json hash map.

What I was saying previously it would be easier to make it symmetrical by removing all the special processing and just have the client supply HashMap<String, json::Value> in normal json format. The clients will need to be fixed as well but in the end it'll use standard serialization.

grampelberg commented 3 months ago

100% agreed the best solution would be to naively use HashMap<String, json::Value>. It sounds like @diptanu is working on something along those lines right now. I'll close this and let that land instead as I'm unblocked now.