cmelchior commented 7 years ago

Right now the interop between RxJava and Realm is not as good as it could be, which is also the feedback we are constantly getting.

This issue is mostly an attempt at summarizing the situation and making sure we have the information in one place instead of being spread out across multiple issues that do not address the full picture.

The Current Situation

Currently, we have 3 major issues preventing great compatibility with RxJava:

No support for Schedulers.
Streams work on immutable objects. Realm uses live objects.
Thread confinement makes it hard to move work across threads.

I'll try to describe each case below and possible solutions.

Challenges

1. Custom Schedulers

Realm's async query methods do not allow the use of custom schedulers.

Take this example.

realm.where(Person.class).findAllAsync().asObservable()
  .subscribeOn(Schedulers.io())
  .subscribe(...);

This will not run the query on the io() thread as one might expect, but on Realm's worker thread. On closer inspection though, this is natural, as findAllAsync() is executed outside the context of RxJava.

The obvious solution is to make RealmQuery observable.

realm.where(Person.class).asObservable()
  .flatMap(new Func1<RealmQuery<Person>, Observable<RealmResults<Person>>>() {
        @Override
        public Observable<RealmResults<Person>> call(RealmQuery<Person> query) {
            return query.findAll().asObservable();
        }
    })
  .subscribeOn(Schedulers.io())
  .subscribe(...);

An implementation was started, but never completed here: https://github.com/realm/realm-java/pull/1978

This implementation has a lot of benefts:

The flatmap function could also do a copyFromRealm for those who wants that. And it would happen on a background thread.
The flatmap operation decide if findAll/findAllSorted/... is used. The alternative would be to duplicate all find* methods as find*Observable(), which feels like overkill, especially considering RxJava2 which have more stream types.
Full interop with all RxJava Schedulers, also test schedulers.
findAllAsync and friends would still be available to those not caring about schedulers or uses RxJava.

Challenges

Annoying to force flatmap onto everyone. Perhaps we could convert realmResults.asObservable() to an observable on the query behind the scenes. Every RealmResults tracks it's original query anyway, at least on a native level. This could be a v2 of his API. The downside is that it would waste work with findAll().asObservable() as the observable would discard the original query result and rerun the query in the observable.
We need to solve the thread confinement issue (See below) in order to make it easy to move the query object across threads.

2. Event streams vs. live objects

RxJava exposes data as streams. This naturally promotes immutable objects as any change should be represented by another event being pushed down the stream. This conflicts with Realms "live" nature where objects are being updated behind the scenes.

Specifically, it causes problems with RxJava operators like buffer() (or any operator that caches items), as the object in the cache might be updated when the developer doesn't expect it.

The obvious solution is of course to provide the capability of creating "snapshots" or "pinned" versions of Realm objects.

This is not trivial, so I'll try to sketch a few solutions:

1) Use `copyFromRealm()`

This will obviously work since it caches the data in memory, but detaching objects from Realm comes with a real cost in terms of memory and performance. Copying an entire object graph can be potentially very expensive and you might end copying a lot of data not really needed.

People going down this route should probably consider mapping Realm objects to View models instead. Since View models only contain the data actually required by the UI, nothing will thus be lost by fetching that data ahead of time.

This approach has problems for very large query results where it will not be possible to copy them into memory.

realm.where(Person.class).asObservable()
    obs.flatMap(new Func1<RealmQuery<Person>, Observable<RealmResults<Person>>>() {
        @Override
        public Observable<RealmResults<Person>> call(RealmQuery<Person> query) {
            // Copy into memory before sending data further down the stream
            // Possible this could be done one internally in the `asObservable()` method
            // TODO solve thread confinement for `realm`.
            return query.findAll().asObservable()
                .map(result -> realm.copyFromRealm(result));        
        }
    })
  .subscribe(...);

2) Frozen/pinned objects

Frozen objects as a concept has been discussed here: https://github.com/realm/realm-java/issues/1208

The basic idea is that it should be possible to "freeze" or "pin" a RealmResults or RealmObject to a specific version of Realm. With the snapshot concept implemented in Core/ObjectStore, the native memory concern described in #1208 is now solved.

The problem with pinning versions is that it can lead to file size explosions since Realm must track the difference between the oldest and newest version of the Realm. So if a version is pinned for a long time it could lead to file size issues. Core is thinking about improving this situation (https://github.com/realm/realm-core/issues/984) and Java could implement automatic compaction when opening the Realm file (https://github.com/realm/realm-java/issues/3739), but neither would solve the underlying problem completely.

Important: We miss real-world information about how the filesize will be impacted in RxJava heavy apps if we introduce versioning pinning. This would need further investigation.

ThreadSafeReference as implemented by Cocoa (https://realm.io/docs/swift/latest/api/Classes/ThreadSafeReference.html) and tracked for Java here: https://github.com/realm/realm-java/issues/4059 will also pin versions, so the concept might be used as a substitute for the API proposed in #1208.

Compared to copyFromRealm(), pinning exchanges memory pressure for disk pressure.

realm.where(Person.class).asObservable()
   .flatMap(new Func1<RealmQuery<Person>, Observable<RealmResults<Person>>>() {
        @Override
        public Observable<RealmResults<Person>> call(RealmQuery<Person> query) {
            // Pin the version before sending sending data further down the stream
            // Possible this could be one internally in the `asObservable()` method
            // TODO solve thread confinement for `query`.
            return query.findAll().asObservable().map(result -> result.freeze());        
        }
    })
  .subscribe(...);

Challenges

With the current proposed API. You cannot tell the difference between frozen/non-frozen objects unless you call isFrozen()/isPinned(). This is, however, the same for managed/unmanaged objects.
We would also have to introduce pinning for RealmQuery and Realm classes.

3) Autovalue

People into RxJava frequently ask for Autovalue support, since it gives immutable objects with no setters (unlike standard Realm model classes) and final fields.

Native support for AutoValue would be extremely hard and would under the hood either be implemented as 1) or 2) anyway. Also, updates to an object would be equivalent to a copyToRealm which would be extremely wasteful. See https://github.com/realm/realm-java/issues/2538. For now, the advice would be that if you really want AutoValue support you should make a new class with conversion functions between Realm and the AutoValue object. Effectively splitting your domain into "Entity classes" and "View model classes" as promoted by the Clean Architecture proponents.

realm.where(Person.class).asObservable()
  .flatMap(new Func1<RealmQuery<Person>, Observable<RealmResults<Person>>>() {
        @Override
        public Observable<RealmResults<Person>> call(RealmQuery<Person> query) {
            // Convert to AutoValue object before sending sending data further down the stream
            return query.findAll().asObservable()
                .map(result -> 
                    List l = new ArrayList();
                    for (Person p : result) {
                      l.add(AutoValuePerson.from(p));
                    } 
                    return l;
                );        
        }
    })
  .subscribe(...);

3. Thread Confinement

Realm's thread confinement and RxJava's streams both attempt to solve the problem of "Concurrent access is really, really hard". Unfortunately, the two approaches are not really compatible. The primary motivating case is:

realm.where(Person.class).asObservable()
   .flatMap(new Func1<RealmQuery<Person>, Observable<RealmResults<Person>>>() {
        @Override
        public Observable<RealmResults<Person>> call(RealmQuery<Person> query) {
            return query.findAll().asObservable();
        }
    })
  .subscribeOn(Schedulers.io())
  .observeOn(AndroidSchedulers.mainThread())
  .subscribe(...);

Here we want to do the work on the io() thread and get the results on the UI thread, but this will throw the dreaded llegalStateException: Realm access from incorrect thread. exception.

Before discussing solutions I would like to iterate how Realm and RxJava respectively solve the concurrent modifications problems:

Realm Each thread operates on its own version of Realm data (MVCC). This gives each thread a fully consistent view of the entire object graph. If we allowed different versions of data to be read on the same thread, it would increase the chance of accidentally trying to operate or compare two objects of different versions which can easily lead to subtle bugs. Each thread advances to the next version at well-defined times (Looper event, transactions) and change listeners are used to notify the user about these events.

By definition, thread-confined object cannot be accessed by multiple threads.

RxJava All data is exposed as a stream of events. Single objects are not modified, but transformed. This means that any change to data will be modeled as a new event being put into the stream. RxJava does not prevent you from accidentally composing streams that operate on different versions of some data. Because changes should be represented as transformations that result in new objects (events), this naturally promotes immutable objects. Data should only be manipulated inside the stream, never from the outside.

By definition, transforming immutable objects inside a stream, cannot cause concurrent modifications.

Takeaway One observation from this is that allowing mutable Realm data to be read from any thread is a very huge anti-pattern. It will conform to neither the original Realm design nor the stream approach. This leads to the conclusion that any solution to thread confinement must operate on a pinned version of the Realm data.

Some of the solutions that exist for this:

1. Use `copyFromRealm()`

Copying the data from Realm will prevent them from being updated by Realm. Immutability is not guaranteed since the Realm Model class most likely will have mutator methods. If that is a concern then people should map the classes to a ViewModel class that is immutable.

copyFromRealm() does not solve the problem on how to move e.g queries and Realm instances across threads as described for custom schedulers.
copyFromRealm() exchanges memory for disk space.
We already have it today, so it is a pragmatic solution.

2. `ThreadSafeReference`

ThreadSafeReference (https://realm.io/docs/swift/latest/api/Classes/ThreadSafeReference.html) as designed in Cocoa only works once, but we could change semantics so it could be used multiple times.

The idea is that all observables are changed, so instead of returning live objects, they returned pinned objects wrapped in a ThreadSafeReference<?>.

// Variant A: ThreadSafeReferencs in the entire chain
// The Generic arguments gets insanely complicated
// Forced to use `x.get().y` for all interactions
realm.where(Person.class).asObservable()
    .flatMap(new Func1<ThreadSafeReference<RealmQuery<Person>>, Observable<ThreadSafeReference<RealmResults<Person>>>>() {
        @Override
        public Observable<ThreadSafeReference<RealmResults<Person>>> call(ThreadSafeReference<RealmQuery<Person>> query) {
            // .get() resolves the reference.
            return query.get().findAll().asObservable();
        }
    })
    .filter(results -> results.get().isLoaded()) // Common to only continue when loaded. This is the reason .get() must work multiple times.
    .map(results -> results.get().first()) // Fetch first object
    .subscribeOn(Schedulers.io())
    .observeOn(AndroidSchedulers.mainThread())
    .subscribe(...);

// Variant B: ThreadSafeReferences for query, then copyToRealm
realm.where(Person.class).asObservable()
    .flatMap(new Func1<ThreadSafeReference<RealmQuery<Person>>, Observable<ThreadSafeReference<RealmResults<Person>>>>() {
        @Override
        public Observable<ThreadSafeReference<RealmResults<Person>>> call(ThreadSafeReference<RealmQuery<Person>> query) {
            return query.get().findAll().asObservable();
        }
    })
    .map(results -> results.getRealm().copyFromRealm(results.get()))
    .subscribeOn(Schedulers.io())
    .observeOn(AndroidSchedulers.mainThread())
    .subscribe(...);

Would solve the problem and would expose "pinned" objects in the type system.
Calling get() everywhere would be a bit annoying.
The downside is that it would force everyone into the ThreadSafeReference class even if not needed.
Lifecycle management of pinned Realms would depend on the GC. It is unclear how many issues this would cause. From Java's perspective a Realm is just a long pointer, but it might hold 10's of MB's of native memory.

3. `.pin()` or `.freeze()`

As discussed here https://github.com/realm/realm-java/issues/1208

The downside is it was originally meant for RealmObject/RealmResults/RealmList classes. It is not clear how well it would work for e.g Realm/RealmQuery classes.

Effectively it would take the ThreadSafeReference concept and implement it behind the scenes

realm.where(Person.class).asObservable()
   .flatMap(new Func1<RealmQuery<Person>, Observable<RealmResults<Person>>>() {
        @Override
        public Observable<RealmResults<Person>> call(RealmQuery<Person> query) {
            // query.isPinned() == true;
            return query.findAll().asObservable().map(results -> results.pin());
        }
    })
  .subscribeOn(Schedulers.io())
  .observeOn(AndroidSchedulers.mainThread())
  .subscribe(...);

The internal code changes required to support frozen objects will probably be quite large and could have a quite big impact on performance as all accessors most likely will require additional logic.

4. Other solutions

No other solution has been thoroughly thought through yet, so a number of solutions could be found here:

Implement PinnedRealmResults, PinnedRealmList, PinnedRealm, etc. versions. Could result in quite a lot of new classes
Others?

Conclussion / TLDR

We need to implement RealmQuery.asObservable(). Solving how to move queries across threads will have a lot of influence on how pinning/thread confinement should be solved as well.
We need to investigate how pinning versions in RxJava heavy apps effect the file size. This should drive the decision for choosing pinning or copyFromRealm as the default solution. Most likely we will have to offer both in any case. Having the RxObservableFactory interface is nice here as we can easily provide multiple solutions that people can swap easily.
copyFromRealm() might not be the "pure" solution, but it is pragmatic.
ThreadSafeReference as implemented by Cocoa is not suited for RxJava support. As a minimum we must provide the option of getting the value multiple times on multiple threads.

I'll try to keep this post updated with more information and QA.

xiaolongyuan commented 7 years ago

cool

Zhuinden commented 7 years ago

ThreadSafeReference makes sense :smile:

Sometimes I wonder how one could force only the Nth and N-1th Realm version to exist, but release the previous ones while freezing.

I think a lot of people had issues with Rx mostly because they misunderstood what the Rx support does.

cmelchior commented 4 years ago

Frozen Object support has been merged to the Core-6 branch: https://github.com/realm/realm-java/pull/6107. This should make it a lot easier to make immutable Realm objects and transfer them across threads while still retaining the lazy-loading properties. The API is still in beta so any feedback is appreciated.

nongdenchet commented 4 years ago

@cmelchior thank you and really appreciate for making #6107 happen. I am currently trying to apply Core-6 to our current project. I have a couple of questions for you base on your changelogs about Equality and hashcode

Why equality currently depends on versionID? (Sorry I am pretty new Realm)
Does frozen object do deep comparison inside equals and hashcode?

marchy commented 4 years ago

Question here on trying to schedule our team's adoption of the much-needed frozen objects solution: any idea when the iOS and/or .NET/Xamarin implementations will hit public availability? (even in beta form)

We won't hold you hostage to it, but it would really help to have a general idea (ie: is it March/spring-time VS later)

tgoyne commented 4 years ago

Frozen objects for obj-c/swift are just awaiting review, so you could potentially start trying it out now. We're expecting it to be very quick and easy to do in .NET (as all of the problems should have been sorted out while implementing it for the other SDKs and it just needs to be exposed in the C# API), but I think it hasn't been started yet.

marchy commented 4 years ago

@tgoyne Thomas thanks a ton for the super expedient reply. That’s extremely comforting to know. Excited for what this enables for Realm - particularly with multi-tier / apps that have separate presentation and domain layers.

Cheers!

realm / realm-java

Better RxJava support #4291

The Current Situation

Challenges

1. Custom Schedulers

2. Event streams vs. live objects

1) Use `copyFromRealm()`

2) Frozen/pinned objects

3) Autovalue

3. Thread Confinement

1. Use `copyFromRealm()`

2. `ThreadSafeReference`

3. `.pin()` or `.freeze()`

4. Other solutions

Conclussion / TLDR

realm / realm-java

Better RxJava support #4291

The Current Situation

Challenges

1. Custom Schedulers

2. Event streams vs. live objects

1) Use copyFromRealm()

2) Frozen/pinned objects

3) Autovalue

3. Thread Confinement

1. Use copyFromRealm()

2. ThreadSafeReference

3. .pin() or .freeze()

4. Other solutions

Conclussion / TLDR

1) Use `copyFromRealm()`

1. Use `copyFromRealm()`

2. `ThreadSafeReference`

3. `.pin()` or `.freeze()`