microstream-one / microstream

High-Performance Java-Native-Persistence. Store and load any Java Object Graph or Subgraphs partially, Relieved of Heavy-weight JPA. Microsecond Response Time. Ultra-High Throughput. Minimum of Latencies. Create Ultra-Fast In-Memory Database Applications & Microservices.
https://microstream.one/
Eclipse Public License 2.0
558 stars 44 forks source link

Kotlin: create Binary Handler for Immutable Collection #203

Open zdenek-jonas opened 3 years ago

zdenek-jonas commented 3 years ago

There are two types of collections in Kotlin, mutable and immutable. The mutable can be handle by out existing Binary Handlers, but we need to support for immutable.

Docs for Kotlin Collections: https://kotlinlang.org/docs/reference/collections-overview.html

fun main() {
    var buffer = mapOf("key1" to 1, "key2" to 2, "key3" to 3, "key4" to 4)
    var copy = mutableMapOf<Integer, String>(); //works
    //var copy = mapOf<Integer, String>(); //does not work

    var storage = EmbeddedStorage.start(buffer);
    storage.shutdown();

    storage = EmbeddedStorage.start(copy);
     exitProcess(0);

}

I have prepared sample maven project to implement these handlers easier kotlin-test.tar.gz

magneticflux- commented 1 year ago

In addition to this, Kotlin objects (singletons) are duplicated upon loading, causing issues for comparisons with sentinel objects (I just ran into this with by lazy{} delegated properties that otherwise work fine).

I'm going to start a PR creating a new "persistence" submodule similar to the existing binary-jdk8 and binary-jdk17, but with a Kotlin dependency and see if I can get something working.

magneticflux- commented 1 year ago

I've been investigating and I can't seem to cause any issues with Kotlin's immutable collections. Kotlin delegates to the standard Java collections in all cases except totally empty collections (which are cached). In fact, when using Java reflection operations on objects or fields using the kotlin.collections.List interface, both List and MutableList appear as java.util.List and everything works fine.

The issue I see now is with various singleton objects (in Kotlin as object declarations, in Java as static final fields excluding enums). When "fresh" they compare identical (== in Java, === in Kotlin), but when reloaded they do not compare identical or even necessarily equal! In some cases the new objects will luckily still compare equal and don't cause issues in JDK classes because it never compares by identity, but it does end up being a major issue in Kotlin since objects are designed to be compared by identity (Kotlin's "switch statement", when(...) { ... }, uses identity to compare objects, IntelliJ IDEA warns against implementing equals, etc.).

This issue of knowing when a class should be persisted in name only (as a singleton) isn't solvable in Java in general as static final fields could be anywhere and have no guarantees, but it is solvable specifically for Java enums and Kotlin objects since you can look up the canonical representation of the singleton from just its class and name using Class.getEnumConstants() and KClass.getObjectInstance().


After looking into the existing enum handling, I'm not sure what the best course of action is for Kotlin objects. From reading createTypeHandlerEnum, isHandleableEnumField, and createEnumHandler, it seems like the current philosophy for persisting enums is to, save all fields and when loading, take the current canonical enum instance, validate its ordinal hasn't changed, and then set all its fields to the newly loaded fields (modifying the existing enum instance). This is contrary to the Java Serialization specification (and other serializers I've used like Kryo and Jackson) which states:

Enum constants are serialized differently than ordinary serializable or externalizable objects. The serialized form of an enum constant consists solely of its name; field values of the constant are not present in the form. ... The process by which enum constants are serialized cannot be customized: any class-specific writeObject, readObject, readObjectNoData, writeReplace, and readResolve methods defined by enum types are ignored during serialization and deserialization. Similarly, any serialPersistentFields or serialVersionUID field declarations are also ignored--all enum types have a fixed serialVersionUID of 0L. Documenting serializable fields and data for enum types is unnecessary, since there is no variation in the type of data sent.

I personally don't understand the use of the current enum persistence strategy. It means that loading an enum modifies the canonical enum instance, changing it for everybody. This could only happen if the enum instance was modifiable in the first place though, which, while legal, has no obvious uses to me.

What was the original purpose of saving all enum fields, and is it still needed by default? Looking at the commit history, I can see enum serialization in its current form was implemented around https://github.com/microstream-one/microstream/commit/837e2529e6e24ea60e8c5fbf284d8af3eda97286, after being disabled in https://github.com/microstream-one/microstream/commit/466f3d9497a2a9dfb2a618aeda2f3486b8389248 by @tm-ms. Do the concerns voiced here: https://github.com/microstream-one/microstream/blob/15e26334bbcf8b23538a2d27f91c84becf85537f/persistence.binary/src/one/microstream/java/BinaryHandlerEnum.java#L33-L40 still apply with today's API?


A similar treatment of Kotlin objects would also be contrary to the Kotlin Serialization specification, which states:

An object serializes as an empty class, also using its fully-qualified class name as type by default: ... Even if object has properties, they are not serialized.

so I don't think the current enum persistence strategy should be used for Kotlin objects.

For Kotlin's enum classes (compiled to normal Java enums) this section applies:

In JSON an enum gets encoded as a string.

// The @Serializable annotation is not needed for enum classes
enum class Status { SUPPORTED }

{"name":"kotlinx.serialization","status":"SUPPORTED"}


Ultimately, I think having Java enums and Kotlin objects treated differently would be extremely confusing to users. I also think the current treatment of enums is confusing, so I would prefer both enums and objects to adhere to the language specs regarding serialization, i.e. be strictly name-based, looking up and returning the "canonical" instance without modification when loaded. This section from the Kotlin Serialization specification sums up my thoughts:

Conceptually, a singleton is a class with only one instance, meaning that state does not define the object, but the object defines its state.

It's simply doesn't make sense to load an "old" singleton in a "new" application; its old data depended on its old type, which no longer exists, so we should substitute the next best thing: its new type along with its new data.

What are your thoughts on deprecating the current storage of enum fields (replacing it with class+name storage) and adding detection of Kotlinobjects to be stored in that same way?

(detection of Scala objects could also be done, using the same singleton technique)