vitrivr / vitrivr-engine

vitrivr's next-generation retrieval engine
MIT License
4 stars 2 forks source link

Unify the use of JSON libraries #79

Open ppanopticon opened 2 weeks ago

ppanopticon commented 2 weeks ago

Task Description

vitrivr engine and its plugins currently rely on various JSON libraries. Namely:

This clutters the dependency tree and leads to unreasonably large artifacts. The goal of this issue is therefore:

  1. To agree on a single JSON library to use.
  2. Remove all these dependencies in favour of the one and only.

Personally, I'd be very much in favour of using kotlinx.serialization for everything. The following modules / classes are affected by this change:

Boundary Conditions

Dependencies relying on different JSON libraries can be adjusted or removed (if not needed anymore).

sauterl commented 2 weeks ago

I second the unified usage of kotlinx.serialization

lucaro commented 2 weeks ago

I also support the use of kotlinx.serialization. The fallback using Jackson is, however, very deliberate since there are some edge cases that will be very difficult to debug. We can choose to not care about those, which is the cleaner and more consistent way of doing things, but it might lead to nasty issues in the future.

faberf commented 2 weeks ago

Thanks for writing this up. I agree with this and am working on it.

faberf commented 2 weeks ago

@ppanopticon Do you know of a plugin for openapi to generate a kotlin client that uses kotlinx serialization?

ppanopticon commented 2 weeks ago

I'd expect this to be specific to the client / server framework you use (and not Open API itself). In Javalin, for example, you have facilities to provide your own JSON serializer (which we did).

See KotlinxJsonMapper

faberf commented 2 weeks ago

I realised there are a bunch of configurations in the openapi generator https://github.com/OpenAPITools/openapi-generator/blob/master/docs/generators/kotlin.md that i havent played with yet. This is more relevant when I am dealing with an autogenerated client (not a server)

faberf commented 1 week ago

@ppanopticon You mentioned in the meeting that you would quickly write up a guide of how to use kotlinx for collections in the context of the kotlinxjsonmapper. I am currently stuck on this point and would appreciate a quick pointer.

ppanopticon commented 1 week ago

Did I? Oh well, here you go:

Problem

The kotlinx.serialization library does (as opposed to many others) not rely on reflection to do its work. Instead, it generates a KSerializer implementation for each class that is annotated with @Serializable at compile time, which are then used during runtime to do the magic.

The problem is with generic collections: Generally, kotlinx.serialization knows how to serialize basic collections. However, in order for it to do so, it needs to to be able to infer the collection- and the element type during runtime. If you use generics, that information is not available during runtime (type erasure in Java), hence, the framework's inability to serialize these collections (e.g., List<V>, Map<K,V>).

Solution

There are three ways to work around this, two of which I will outline here in more detail.

Use arrays (list only): If you want to serialize a list of, let's say, Animals you can convert the List<Animal> into an Array<Animal>. Since the element type of an Array is retained at runtime, the serialization framework can handle it. Of course, Animal must be annotated with the @Serializable annotation.

@Serializable
data class Animal(...)

val animals = ArrayList<Animal>()

/** Do something to populate list. */

animals.toTypedArray()

Use wrapper: Instead of serializing the collection directly, you can declare and serialize a wrapper object.

@Serializable
data class AnimalWrapper(val animals: List<Animal>)

Since the type information is static (and known) at compile time, kotlinx.serialization can generate a serializer for this wrapper.

Wire your own KSerializer: In theory, you can also wire your own KSerializer during runtime by using reflection. However, I would not do that since this becomes pretty ugly very quickly and requires the use for experimental APIs.

lucaro commented 1 week ago

Thanks, @ppanopticon, for the detailed write-up. Maybe as a quick addendum, specifically in the context of APIs: some code generators struggle with generic collections when used as a return type of an API call. It is,, therefore,, preferable to introduce container/wrapper types rather than returning a 'naked' collection. I.e., use something like

data class AnimalList(val animals: List<Animal>)

rather than

Array<Animal>

when defining API enpoints.