sushihangover commented 7 years ago

I have a fork/variant of the key-value store Akavache project called Hoard that I did for a client in order to remove SQLite from their Xamarin.Android and Xamarin.iOS app (and add features that Akavache does not offer)

They were already using Realm for their MVVM backing store and 1) did not want to include SQlite due to Android-N requirements and 2) wanted an all Realm solution in order to use the upcoming Realm Mobile Platform for sync'ing the cache... All is well and between their MVVM store and using Hoard as a k/v auto-refreshing cache handling 100+k plus records, they are happy with the performance and Realm (v0.80) has proven to be extremely stable and performant for their app.

That said, I am trying to determine if there is a way to get the performance of Hoard using Realm as a store to at least equal to Hoard/Akavache using SQLite, here are some graphs showing the time difference between the two stores in retrieving individual cache records:

screen shot 2016-11-25 at 6 14 31 pm

While the performance of an individual insert or read is not noticeable, pushing/subscribing to hundreds of elements can cause minor app jitters at times.....

The reason for the time difference:

A single RealmObject or a RealmResults is being passed through an Observable sequence, since these RealmObject(s) are live, they have to stay on the same thread and thus the current design is based on holding a single open instance of the Realm db at the class level, creating instances from that via Realms.Realm.GetInstance as Observable sequences are requested/subscribed to. This new instance is used as the reference for the RealmObject or RealmResults that are passed through the sequence and this new instance is held open till the Observable is disposed (the sequences can be lazily consumed, in full or not, and they have live Realm elements in them...)

This is one of the simple sequences that is consuming a RealmResults but shows the basic sequence flow:

public IObservable<IDictionary<string, T>> GetObjects<T>(IEnumerable<string> keys)
{
    return Observable.Create<IDictionary<string, T>>(o =>
    {
        var cKey = "";
        var now = Scheduler.Now.UtcDateTime.Ticks;
        var name = typeof(T).FullName;
        var realm = Realms.Realm.GetInstance(instance.Config);
        var ofTypeRealmResults = realm.All<CacheElement>().Where(x => x.TypeName == name);
        var nonExpiredRealmResults = (RealmResults<CacheElement>)ofTypeRealmResults.Where((CacheElement e) => e.Expiration > now);
        keys.ToObservable()
            .SelectMany(key =>
            {
                cKey = key; // Hack: We are scheduling on a CurrentThreadScheduler.Instance, 
                        // so avoid creating a temp object (KeyValuePair<string, CacheElement>) to pass down the sequence
                return nonExpiredRealmResults.Where((CacheElement e) => e.Key == key);
            })
            .SelectMany(e => e == null ? ExceptionHelper.ObservableThrowKeyNotFoundException<CacheElement>(cKey) : Observable.Return(e))
            .SelectMany(e => CacheUsageStatistics(e, currentThreadScheduler))   
            .SelectMany(e => NearExpirationReCache(e, currentThreadScheduler))          
            .SelectMany(e => AfterReadFromDiskFilter(e.Value, currentThreadScheduler))
            .SelectMany(b => DeserializeObject<T>(b))
            .Select(obj => new KeyValuePair<string, T>(cKey, obj))
            .ToDictionary(k => k.Key, v => v.Value)
            .Take(1)
            .Subscribe(o);
        return Disposable.Create(() =>
        {
            realm.Dispose();
        });
    });
}

As there can be dozens of sequences on the fly/wire constantly and those also include inserting or updating RealmObjects and deleting individual records and bulk deletions via Realm.RemoveRange, the overhead of creating and disposing all of those Realm instances is really hurting throughput and increases memory pressure.

Without creating a custom message pump, a thread pool and creating copies of a Realm instance that are held open on each thread within the pool and queuing requests to those threads, is there anything else I can do with the current Realm instance/thread architecture. Or I am just missing something and need more sleep ;-)

Notes:

I would love it if Realm had a built-in way to get a fast "non-managed copy" of a RealmResult/RealmObject (i.e. just give me a POCO so I do not have to manage which thread it is being used on.
This is a good use-case, to me at least, for being able to do a direct Select (projection) on a RealmObject

AndyDentFree commented 7 years ago

@sushihangover your point is well taken and different ways to handle that are being discussed on a regular basis.

I would love it if Realm had a built-in way to get a fast "non-managed copy" of a RealmResult/RealmObject (i.e. just give me a POCO so I do not have to manage which thread it is being used on.

A first thought from a quick look:

If you have similar code that does lookups of a single object by a key then using ObjectForPrimaryKey as shown here, will be faster. It bypasses all the LINQ parsing and uses a faster query method.

sushihangover commented 7 years ago

@AndyDentFree Thanks for the reply.

In this particular use case, I really do not really use live objects at all as the RealmResult/RealmObject is totally behind OZ's curtain and the consumer of the API pushes or pulls a totally different thing (i.e. a KeyValuePair<string, T>). (There are live RealmObjects behind the scenes that are held while their data refresh is pending/on-the-wire and then using RealmChanged and Subscriptions are refreshed and published)

Now for MVVM, I love live RealmObjects objects, it is such a better and way more efficient model than the never-ending SQL/CRUD/Event/POCOs setup.

FYI: I use ObjectForPrimaryKey whenever I can:

i..e Here I just grab the RealmObject via ObjectForPrimaryKey, pass it into the Observable.Start and then check the object expiration property to propagate it or a null into the rest of the sequence, really fast, with the exception that I have to create Realm instance to do it ;-)

    public IObservable<T> GetObject<T>(string key)
    {
        if (disposed) return ExceptionHelper.ObservableThrowObjectDisposedException<T>(nameof(RealmBlobCache));
        if (key == null) return Observable.Throw<T>(new ArgumentNullException());

        return Observable.Create<T>(o =>
        {
            var realm = Realms.Realm.GetInstance(instance.Config);
            var now = Scheduler.Now.UtcDateTime.Ticks;
            var element = realm.ObjectForPrimaryKey<CacheElement>(key);
            Observable.Start(() => { return (now < element?.Expiration) ? element : null; }, currentThreadScheduler)
                .SelectMany((x => x == null ? ExceptionHelper.ObservableThrowKeyNotFoundException<CacheElement>(key) : Observable.Return(x)))
                .SelectMany(y => AfterReadFromDiskFilter(y.Value, currentThreadScheduler))
                .SelectMany(z => DeserializeObject<T>(z))
                .Take(1)
                .Subscribe(o);
            return Disposable.Create(() =>
            {
                realm.Dispose();
            });
        });
    }

nirinchev commented 7 years ago

Is Hoard open sourced (I couldn't find it under your Github repos)? If not, do you think it would be possible to give us access - we can sign an NDA if required? I'd love to play with it and see if we can improve the performance.

In terms of creating instances, we are working on a mechanism for reusing instances on the same thread, so that should reduce memory pressure a little but I'm not sure if it will create noticeable difference.

AndyDentFree commented 7 years ago

Second @nirinchev's comment that it would be best if we could play with this directly, in particular, tuning performance in something of this complexity it would be best to also use a profiler.

Alternatively, if giving us access really isn't possible, we can look into helping you setup a profiling setup with you building directly against an adjacent set of Realm source. I have several times move NuGet code across to using Realm directly, as part of diagnosing (rare) crashes for support - it is a pretty simple procedure.

sushihangover commented 7 years ago

@nirinchev @AndyDentFree Hoard will be open-sourced in some form, currently it is in a private Gitlab repo.

I have to remove specific client code that contains IP and some work-4-hire non-releasable code before I can do that, and the client is OSS-friendly. I was planning it either as a stand-alone package (Hoard), or back-ported as a Realm plugin for Akavache, it is a fork/variant of their really great work).

I was planning on releasing an alpha of it after the next Realm release is published so I can incorporate the new Realm SDK changes in it, but can get you access to a GitLab repo if you are interested as I have someone else who might sponsor the work to add a hybrid Realm|Secure-filesystem store to it.

I would love for "better minds" to be able to play with it to see what can be done. As it was used by the client as a drop-in replacement for Akavache, I did not the have the time or the flexibility to do huge breaking changes to the original Observable/System.Reactive API, but would be open to diverge the code base to better fit the Realm model or just improving my naive System.Reactive code ;-)

sushihangover commented 7 years ago

@nirinchev @AndyDentFree I've been able to get to back to this and should have a Hoard version posted to github this weekend. In profiling Hoard, I realized I need dedicated Realm threads to really get the throughput that I thought was possible, so...

I've created a Realm-based Action/Task Message Pump (RealmThread) to avoid creating/disposing Realm instances on every API call within Hoard and realized it might be of interest to others, so I broke it out into a separate public repo/Nuget. It is pre-release status and currently missing published change events, but is very usable, it just needs some eyes other than mine to see what is missing/screwed up/... 😎

Really rough alpha/POC tests with Hoard and RealmThread show a throughput performance of 100x+ in most areas... I need to evaluate threadpool load balancing, but that is a future TODO...

SushiHangover.RealmThread

An Action/Task Message Pump for running commands on a dedicated Realm thread.

GitHub Repo

https://github.com/sushihangover/RealmThread

kristiandupont commented 7 years ago

This is awesome @sushihangover, thanks for letting us know!

nirinchev commented 7 years ago

@sushihangover did you manage to solve any outstanding issues or can we help in some way?

sushihangover commented 7 years ago

@nirinchev Technically this is still an open issue, but the client has not paid their invoice (mine or others on the project). I'll close this and open a new issue if needed, Thanks.

nirinchev commented 7 years ago

Sorry to hear that! Hope things work out with your client and be sure to let us know if you need some help later :)

realm / realm-dotnet

Assistance: Performant and data safe Observable sequences and Realms.Realm.GetInstance #950

SushiHangover.RealmThread

GitHub Repo