POJO modified in the 2nd level cache

mybatis / mybatis-3

MyBatis SQL mapper framework for Java

http://mybatis.github.io/mybatis-3/

Apache License 2.0

19.75k stars 12.84k forks source link

POJO modified in the 2nd level cache #1127

Open sebdavid opened 6 years ago

sebdavid commented 6 years ago

MyBatis version

MyBatis 3.4.5
mybatis-spring 1.3.1
mybatis-ehcache-1.1.0
Spring 4.3.3

Database vendor and version

Problem discovered when using PostgreSQL 9.6, but reproduced with H2 DB 1.4.196

Test case or example project

I created a sample maven project with 2 tests to show the problem : https://github.com/sebdavid/mybatis-cache-modified

Steps to reproduce

The problematic scenario is :

cache is empty at the beginning
begin a transaction
read an object from mapper
modify the POJO
close the transaction
read the object from mapper --> the returned version of the object is the modified one

Expected result

Step 5 should return the POJO as it is read from the mapper the first time

Actual result

The returned POJO is the one which has been modified on step 3.

Additional info

Transactions are managed by Spring with @Transactional. In the XML mapper, I'm declaring a cache of type EhcacheCache, but the same issue occurs with the default cache (i.e. <cache />).

From what I understand : the object is written in the cache when the transaction is closed (step 4). But it seems that the POJO returned after the first read (step 2) is the same object (same reference) than the one stored in the TransactionalCacheManager (CachingExecutor, line 104 ?). So when it is modified in step 3, those modifications are stored in the cache when the transaction is committed. That is why when the object is retrieved from the cache at step 5, it is the modified object which is returned.

As the documentation states that "The cache will be treated as a read/write cache, meaning objects retrieved are not shared and can be safely modified by the caller, without interfering with other potential modifications by other callers or threads. " (http://www.mybatis.org/mybatis-3/sqlmap-xml.html#cache), I'm wondering if it is a bug or if I am missing something.

If the object already exists in the cache before the process, the steps 1 to 4 have no impact on the object stored in the cache.

sebdavid commented 6 years ago

I just had a look to the mybatis-spring repository, and found a similar issue reported by @honzasmuk : https://github.com/mybatis/spring/issues/219

harawata commented 6 years ago

Hi @sebdavid ,

Thank you for the report and the example project!

As the documentation states that "The cache will be treated as a read/write cache, meaning objects retrieved are not shared and can be safely modified by the caller, without interfering with other potential modifications by other callers or threads.

This is about the objects retrieved from the cache (or, more precisely, the built-in cache). In your case, the above explanation is not applied because the POJO is modified before it is put into the cache.

If you need to modify returned objects in a transaction with 2nd level cache enabled, you may have to create a copy beforehand.

sebdavid commented 6 years ago

Hi,

I understand now that the rule

objects retrieved are not shared and can be safely modified by the caller

is only concerning objects retrieved from the cache (i.e. already existing in this cache). It is not a global rule which would say "when you read an object from a mapper, you will always get a different object reference than the one written in the 2nd level r/w cache".

But, something seems weird to me. I will try to explain my thoughts.

My app is split in independent layers: the service layer (business logic) and the persistence layer (MyBatis). The service layer is managing transactions and calling the mappers. From the point of view of the service layer, objects are get from the persistence layer without knowing if there is a cache enabled or not, and without knowing if the objects are retrieved from the cache or read from the DB. From the service layer, we can have two different behaviors when calling personMapper.get(...) in a transaction :

in one case the returned object can be modified: when the object is retrieved from the cache ;
in the second case it cannot, a copy must be done before the modifications : when the object is read from the DB.

This is not consistent for me : in one case we should do a copy (object not retrieved in the cache), in the other case MyBatis is doing the copy for us (object retrieved in the cache). In other words: in one case the mapper returns a reference to an object managed internally by MyBatis (to be written in the cache), and in the other case MyBatis is returning a clean copy. I would expect that the objects, written in the 2nd level r/w cache (or which are "going to be written" in the case of a transaction), are never the same than those returned by the mapper, to keep the cache independent from the rest.

The question could be : in a transaction, why couldn't we modify the object returned by the mapper when it does not exist in the cache, whereas we can modify the object returned by the same mapper when it exists in the cache?

However, if this is really the expected behavior, a copy of the objects to modify should always be performed, as you said. But then, maybe the documentation should mention the case when an object is read within a transaction: if this object is modified, a copy should be performed previously (unless it has been retrieved from the cache, but as the caller doesn't know from where the object comes, a copy should always be done).

Thank you for your support.

harawata commented 6 years ago

Hi @sebdavid ,

It is reasonable that you think the behavior is inconsistent.

A possible (obvious) solution would be to return a copy of the object while storing the original object into TransactionalCache (or the other way around), but it won't be as efficient as copying in users' code because we won't be able to ignore some complex use cases (we may have to perform deep copy even when shallow copying is sufficient, for example).

Do you still think it should be handled in MyBatis side? Or is there a better idea?

🗒Taking notes before I forget. Things to consider if we fix this.

It has to be configurable because 1) not all objects are serializable, 2) copying object can be slow and 3) there could be a solution that relies on the current behavior (is global on/off setting sufficient?).
Can we assume that the list implements Serializable?
Does it have to work with objects with lazy properties?

Cc-ing @emacarron in case I missed some history regarding this topic.

honzasmuk commented 6 years ago

Hi guys, thank you you are looking into it. My problem in mybatis/spring#219 is mainly with (in my oppinion) bad behavior of tx rollback - the modified pojo was written into 2nd level cache even if I marked the tx as rollback-only.

sebdavid commented 6 years ago

Hi @harawata ,

You're right about the performance. For now, I'm doing a copy of the object in my transactionnal method, before modifying it. I implemented the clone() method on the concerned object and child objects, and I coded it by cloning only what is really necessary. Of course MyBatis cannot guess that.

But I was asking to myself : "but MyBatis is already doing copies when an object is retrieved from the cache?" and then I had a look into it.

With the default cache (<cache />), MyBatis is serializing objects to byte[], and storing this array in the cache : all the concerned objects have to implement Serializable (otherwise an exception is thrown). But, when using another cache implementation, say EhcacheCache, the mecanism can be different depending on the configuration of ehcache, and MyBatis doesn't know anything about this config. From what I saw :

either the object is serialized (readOnCopy=true, readOnWrite=true) with the prerequisite that it must implement Serializable ;
or its reference is directly put in the cache.

Something like the config of ehcache is maybe a solution, with copyOnWrite / copyOnRead : copyOnRead would be similar to the readOnly attribute of MyBatis Cache

copyOnWrite :
- if false, the behavior would be the same as currently, but the documentation has to warn the users that if any modification is performed on the object, it will be also sent to the cache ;
- if true, a copy is performed. An optimization would maybe be to check if the object is implementing the Cloneable interface : if it is, the clone() method is called to do the copy; if it is not, a serialization+deserialization is performed.

But with this solution, there is a thing: currently the readOnly attribute on <cache /> is used by the default cache implementation of MyBatis. The copyOnWrite setting would have to be managed at a higher level, or globally maybe: because the copy should be done before the object is sent to the implementation.

Another additional configuration (but this is maybe a little bit out of the scope of this discussion) would be to let the user choose the wanted copy mecanism: clone, serialize (, ...?)

But then... the configuration of the cache may be weird when EhcacheCache is used. From MyBatis side, copyOnWrite would be "true" but false on the EhCache side, otherwise the copy would be done twice.

Now I understand a little bit more the complexity of this point !

CodeSheeper commented 6 years ago

FUNK

linpingchuan commented 6 years ago

Hi all, I found that solution can solve this problem.

public class TransactionCacheManager {
    private Map<Cache, TransactionalCache> transactionalCaches = new HashMap<Cache, TransactionalCache>();

    public void clear(Cache cache) {
        getTransactionalCache(cache).clear();
    }

    public Object getObject(Cache cache, CacheKey key) {
        return getTransactionalCache(cache).getObject(key);
    }

    public void putObject(Cache cache, CacheKey key, Object value) {
        getTransactionalCache(cache).putObject(key, SerializationUtils.deserialize(SerializationUtils.serialize((Serializable) value)));
    }

    public void commit() {
        for (TransactionalCache txCache : transactionalCaches.values()) {
            txCache.commit();
        }
    }

    public void rollback() {
        for (TransactionalCache txCache : transactionalCaches.values()) {
            txCache.rollback();
        }
    }

    private TransactionalCache getTransactionalCache(Cache cache) {
        TransactionalCache txCache = transactionalCaches.get(cache);
        if (txCache == null) {
            txCache = new TransactionalCache(cache);
            transactionalCaches.put(cache, txCache);
        }
        return txCache;
    }
}

When the value try to put in cache, just serialize and deserialize values which can deep copy all fields.

Then, I just replace the class by bytebuddy.

ByteBuddyAgent.install();

new ByteBuddy()
                .redefine(TransactionCacheManager.class)
                .name(TransactionalCacheManager.class.getName())
                .make()
                .load(TransactionCacheManager.class.getClassLoader(), ClassReloadingStrategy.fromInstalledAgent());

vangogh-ken commented 2 years ago

Hi all, I found that solution can solve this problem.

public class TransactionCacheManager {
    private Map<Cache, TransactionalCache> transactionalCaches = new HashMap<Cache, TransactionalCache>();

    public void clear(Cache cache) {
        getTransactionalCache(cache).clear();
    }

    public Object getObject(Cache cache, CacheKey key) {
        return getTransactionalCache(cache).getObject(key);
    }

    public void putObject(Cache cache, CacheKey key, Object value) {
        getTransactionalCache(cache).putObject(key, SerializationUtils.deserialize(SerializationUtils.serialize((Serializable) value)));
    }

    public void commit() {
        for (TransactionalCache txCache : transactionalCaches.values()) {
            txCache.commit();
        }
    }

    public void rollback() {
        for (TransactionalCache txCache : transactionalCaches.values()) {
            txCache.rollback();
        }
    }

    private TransactionalCache getTransactionalCache(Cache cache) {
        TransactionalCache txCache = transactionalCaches.get(cache);
        if (txCache == null) {
            txCache = new TransactionalCache(cache);
            transactionalCaches.put(cache, txCache);
        }
        return txCache;
    }
}

When the value try to put in cache, just serialize and deserialize values which can deep copy all fields.

Then, I just replace the class by bytebuddy.

ByteBuddyAgent.install();

new ByteBuddy()
                .redefine(TransactionCacheManager.class)
                .name(TransactionalCacheManager.class.getName())
                .make()
                .load(TransactionCacheManager.class.getClassLoader(), ClassReloadingStrategy.fromInstalledAgent());

But it works for me while getting cache with deep copy