realm / realm-dotnet

Realm is a mobile database: a replacement for SQLite & ORMs
https://realm.io
Apache License 2.0
1.25k stars 164 forks source link

Pagination examples please.... #1782

Closed ozzioma closed 4 years ago

ozzioma commented 6 years ago

Guys great work!

I am using Realm for Xamarin and Java and I need clear examples on how pagination works. Was surprised to learn Skip and Take are not supported yet, how soon?

Can I suggest examples explaining how exactly pagination is not needed with Realm? Sorry but this is not sufficient

Since queries in Realm are lazy, performing this sort of paginating behavior isn’t necessary at all, as Realm will only load objects from the results of the query once they are explicitly accessed.

If for UI-related or other implementation reasons you require a specific subset of objects from a query, it’s as simple as taking the IQueryable object, and reading out only the objects you need.

I mean this problem of pagination with Realm gets most new users stumped, and there's no example! Honestly it's a hard concept telling users to replace well known pagination routines with a loop...

Most of the UI controls and grid/list display routines out there have callbacks that include query, sort parameters...and pagination parameters. A simple example would have clarified this feature of not needing "those loading mechanisms with realm". Would it be too much work to add a few lines of examples to your documentation showing exactly how it's done?

It's clear the use case being referenced in your documentation is not sufficient to cater to the majority of pagination use cases out there.

You do need a way to execute SKIP, TAKE, TOP, LIMIT routines. If Realm already does that for local data, can we see how?

Thanks.

nirinchev commented 6 years ago

The main reason people use pagination is to control memory usage and improve performance. With regular SQL databases, you don't want to load a million records in memory because that will take a long time to execute and users are unlikely to browse through a million rows.

When you load data with Realm though, you don't read the entire dataset in memory, so there's no benefit from loading the first 100 items of a query as opposed to the first 1000. When a query is executed, we create a very lightweight wrapper that contains information about the size of the results and where to find its elements. So a query that matches 100 items will be roughly as performant as a query that matches 10000. Then as you iterate (e.g. because the user scrolls the UITableView), Realm doesn't load every single object in memory, but instead again creates a very thin wrapper around the object that contains information about where the object's properties are located. Then only when you access these properties, do we fetch the data from disk/memory.

I'm not sure what user experience you're aiming for in your app, but generally, you should be able to just create a very large list view (e.g. UITableView in iOS) that contains all the elements and take advantage of the framework's virtualization techniques coupled with Realm's lazy evaluation capabilities to avoid having to fetch data in batches.

Hope this helps, but if you have more specific questions, I'd be happy to explain more.

ozzioma commented 6 years ago

Hi @nirinchev Thanks for the response!

I'd already posted this on the Java repo while facing the same issue. I'll repost here to address the .Net scenario.

I'm afraid my concerns were not addressed. I do understand lazy loading and proxying virtual objects.

The use case you described does not address the majority of the scenarios out there where pagination parameters come into play.

I am talking about scenarios where pagination parameters ARE NEEDED.

Please consider the fact that lots of users are raising this issue, it's a valid use case....the majority of data access patterns out there involve pagination. Even if the data is lazy loaded, it does not rule out the need for implementing Skip, Take, Limit routines...for these scenarios.

For example, say you have this API or databound control that allows you implement a call back to retrieve a window of rows. You are passed these set of parameters int Page =234, int PageSize=30

That is after you have applied very specific sorting and filtering parameters to the data. This is an example from a Xamarin Forms app:

OnLoadMore = async () =>
                {
                    IsBusy = true;
                    if (TotalRows > 0 || DataList.Count > 0)
                    {
                        if (!(DataList.Count < TotalRows))
                        {
                            return null;
                        }
                    }

                    int page = DataList.Count / PageSize;

                    PageSize = PageSize;

                    CurrentPage = PageSize * page;

                    //This is what I should do... LINQ style...
                    var tableData = DbContext.LocationUpdates.OrderBy(r => r.DateUpdatedUtc).Skip(CurrentPage).Take(PageSize);

                    PageCount = tableData.Count;
                    TotalRows = tableData.LongCount();
                    return new InfiniteScrollCollection<LocationUpdate>(tableData);

                //This is what I should not have to do. Is this the Realm way of pagination data?
                  var tableData = DbContext.LocationUpdates.OrderBy(r => r.DateUpdatedUtc).ToList();
                  List<LocationUpdate> rows = new List<LocationUpdate>();

                    //faulty pagination logic I know
                    for (int count = CurrentPage; count < tableData.Count(); count++)
                    {
                        rows.Add(tableData[count]);
                    }

                    PageCount = rows.Count;
                    TotalRows = tableData.LongCount();
                    return new InfiniteScrollCollection<LocationUpdate>(rows);

For emphasis, you cannot pass an List, Map, IQueryable to a control or API that is expecting a specific number of rows. If it's an API call, say a REST service response where the app has to sync a subset of rows to the backend server in the background, I'm hoping you're not suggesting just lazy loading a List or IQueryable??

If it's local data and you have to deal with pagination call backs, then Realm simply does not support pagination or so it seems.

In the case of IQueryable, the documentation makes it clear that Skip, Take are not supported. So the option of doing a ToList() just increases the confusion, since the databound control still has to call Take and Skip on something. How do you track just how many rows have been loaded as the user scrolls down or up? It doesn't matter??

You need a way to reason around the logic of batching rows, it is the most popular use case.

An example would have sufficed I think, you're assuming local data pagination is not a valid use case. Even if Realm returns a lazy loaded List, and I absolutely need to paginate it for whatever reason, how do I get it done? Say the app specs requires an auto pull to refresh after the first 2,000 rows? Or in response to a DataGrid call back with filter and page parameters? Looking forward to your response.

Thanks.

nirinchev commented 6 years ago

Things are still a bit abstract here - it's unclear to me which controls you're talking about that require pagination callbacks and can't accept a datasource that represents the entire dataset. I'm unfamiliar with the Android API, but UITableView, UICollectionView, and the Xamarin.Forms collection controls don't inherently require pagination. When you think about it, designing your app around naive pagination (with skip/take) would result in a less-than optimal UX. Consider what happens if someone inserts an object in the background at a position the user has already scrolled past - a naive pagination implementation would result in the same item appearing twice at the end of the current and the start of the new page. Realm provides comprehensive notification API to build a reactive user experience and pagination goes contrary to that goal.

Regarding your second point about calling REST API or something with a paginated dataset - while it's odd design in my opinion to do that, and again, prone to errors (what happens if an item is inserted between requests - will that never make it to the backend?), you can easily implement custom pagination like:

public static IEnumerable<T> Paginate<T>(this IQueryable<T> query, int skip, int take)
{
    var collection = query as IRealmCollection<T>;
    for (var i = skip; i < skip + take && i < collection.Count; i++)
    {
        yield return collection[i];
    }
}

Finally, it' not clear what you mean in this paragraph, so if my reply doesn't address that already, please clarify with some specific examples of what you're trying to achieve:

In the case of IQueryable, the documentation makes it clear that Skip, Take are not supported. So the option of doing a ToList() just increases the confusion, since the databound control still has to call Take and Skip on something. How do you track just how many rows have been loaded as the user scrolls down or up? It doesn't matter??

What databinding framework are you using that uses take and skip? Generally, you should avoid materializing Realm collection by calling .ToList as that will remove the INotifyCollectionChanged implementation and will prevent your UI from updating automatically when items are added/removed.

ozzioma commented 6 years ago

@nirinchev Thanks man!

This snippet here is worth more than 30 paragraphs explaining the nuances of virtualized object instances and lazy loading! I came up with something similar but was stumped on Take and Skip not being supported. More like this, had to apply filters and order the rows first....

var rows=realmInstance.All<Customers>().Where(r=>r.City=="Moscow").OrderBy(r=>r.Name);
var customerBatch=rows.Paginate(22,30);

public static IEnumerable<T> Paginate<T>(this IQueryable<T> query, int skip, int take)
{
    var collection = query as IRealmCollection<T>;
    for (var i = skip; i < skip + take && i < collection.Count; i++)
    {
        yield return collection[i];
    }
}
//you need this state to keep track of your UI data rendering, except if you're saying the UI is stateless!
PageCount = customerBatch.Count();
TotalRows = rows.LongCount();

Don't you think adding this snippet to the documentation is needful? I mean we are dealing with IQueryable here, and it's expected by default that Take,Skip support is almost always baked in. Rather than being the exception or according to you odd design, Realm's lack of support for Take and Skip is what is really odd. Like why bother implementing IOrderedQueryable when you can't paginate the data from source? Check out the issues and questions asked by users evaluating Realm and you might just get the picture.

How do you reconcile this perspective below with the Twitter or Instagram use cases?

Consider what happens if someone inserts an object in the background at a position the user has already scrolled past - a naive pagination implementation would result in the same item appearing twice at the end of the current and the start of the new page. Realm provides comprehensive notification API to build a reactive user experience and pagination goes contrary to that goal.

Mind you, new comments, likes and shares are constantly being updated as you scroll up or down your timeline. I'd be glad to learn how Realm addresses this particular use case as compared to using say SQLite, as I'm working on something similar. A concrete example or something similar will be helpful. If the implementation looks cool enough with Realm, I might consider doing a blog series on cloning Twitter's or Instagram's timeline implementation using Realm.

My take is you still need to paginate local or remote data. Any how you slice and dice it, fast moving data still gets rendered in batches. I'm yet to be aware of any UI that renders data in no particular order and keeps on rendering data as they appear without implementing some filtering or ordering. Somewhere along the line, you have to materialize a fixed subset of data for rendering. The dataset/List being lazy loaded or not doesn't really matter at that stage, because what the UI is rendering at any point in time...is the materialized subset.

About this

while it's odd design in my opinion to do that, and again, prone to errors (what happens if an item is inserted between requests - will that never make it to the backend?),

If I get you right, apps on mobile devices or apps reading data from local Sqlite databases or pulling data from remote MySQL servers...are prone to errors because they pull in data in batches? Transactional reads are there for a reason. The counter to the data being inserted in between requests question is, what if the insert fails in between requests or is rolled back? I think each read for most production systems out there is fairly isolated to committed data, and incoming data notifications is pushed or pulled to subscribers through another channel. A refresh can be executed to fetch new rows using again....pagination parameters ...after sorting by timestamp or some other parameter. Check out how Instagram or Twitter alerts users to new posts midway down a timeline. That is the general use case IMO.

About this

What databinding framework are you using that uses take and skip? Generally, you should avoid materializing Realm collection by calling .ToList as that will remove the INotifyCollectionChanged implementation and will prevent your UI from updating automatically when items are added/removed.

Do you have any experience working with .Net UI toolkits like DevExpress, Telerik or Syncfusion? Or built in ASP.Net ListView or Datagrid controls? These all work with IQueryable enabled datasources. Look it up if you need more details. Also, the most popular way to render huge data lists in Android is through the RecyclerView associated classes. Even here you explicitly need to return fixed data pages. Look it up too for more details.

Keep in mind that the UI always renders a materialized subset AT A TIME.

Most users are going to be filtering/searching by too many arbitrary parameters midway through a data grid for the scenario you described to be the sole use case. If the picture is not clear, think of a search page on Amazon.com where you can filter by price, brand etc....and still browse through over 1,000 search results. Those search results do get rendered in batches don't they? And users still have to use pagers at the bottom! Now imagine the mobile UI scenario.

Since Realm is targeting mobile apps, the use cases referenced here default to memory optimized rendering of data batches. Whether the dataset is a lazy loaded List or a fully materialized List, either way the UI always renders a materialized subset AT A TIME. Meaning the dataset gets paginated ANYHOW. Somewhere, somehow the developer has to have a handle on EXACTLY how many new posts, comments, likes and shares have been rendered/updated so far.

That's the point, except if I'm missing some major capability or feature of Realm that totally rules out pagination or batch rendering IN ALL CASES.

Please update the documentation with the above snippet or similar snippets for the sake of newbies who are used to the normal way of working with IQueryable and IOrderedQueryable instances. Realm is way cooler than Sqlite and should not have new users stumped on trivial use cases.

Thanks again for your response!

nirinchev commented 6 years ago

This is a very long comment, so I'll reply to more points later tonight, but the general guidance is "you should not care". Essentially, all the classic databinding collection controls (by Telerik, DevExpress, even the built in ones) work out of the box with Realm's implementation of IQueryable. They do materialize objects from the IQueryable but that happens transparently to the developer and requires zero effort/code on your part. To be more specific, imagine you have a ListView (or any of the fancier 3rd party implementations) that can display 10 items at a time on the screen. You pass the result of a Realm query (it can be filtered, sorted, etc.) as a data source to the control. Whenever it gets rendered, it requests the first 10 elements of the collection to render the first 10 cells. It calls ElementAt on the Realm collection, which materializes a wrapper around the object for each element. Keep in mind that because we have passed in the entire collection, the ListView knows its size and can correctly render the scrollers. Then as the user scrolls down, a new cell is brought into view, which invokes again ElementAt on the Realm collection, obtaining a materialized instance of the element at position 11, and so on. At the same time, the model object at index 0 will be eligible for garbage collection because it's no longer in the view. As the user scrolls up, the ListView will take care to request it again and render the cell from scratch. As you can see, at no point do we need to provide the listview with new data because it already has everything it needs and the internal mechanics of the control take care of tracking indices and which elements are rendered, and which are not.

Now, the cool thing about Realm collections is that they implement INotifyCollectionChanged, so if someone inserts new objects in the background (e.g. you have a background job that calls a REST API and periodically fetches new data from the server), the ListView will get notified and will update its internal state to reflect the collection changes. Additionally, it will do that in a way that preserves the currently rendered portion - e.g. if the user is looking at items 10-20 and in the background you insert 100 new items at position 0, the ListView won't scroll the current UI out of view. Additionally, as the user scrolls up, they'll see the new items in the correct order and just once. This is much harder to achieve when using pagination.

Finally, to your point about implementing twitter or instagram feeds - if you notice, their pagination API doesn't use offsets - it uses tokens/object ids to guide pagination, which is why you don't see duplicates in their feeds.

ozzioma commented 6 years ago

@nirinchev Thanks again for your response. Your explanation was on point and very well illustrated.

However the confusion came from the caveat in your documentation explicitly stating that Take and Skip were not supported for IQueryable instances returned by Realm.

I really did not know what to make of that statement when databound controls were expecting an IQueryable instance. No mention is made of any RealmCollection powering the underlying collection structure...at least in the section for pagination.

Maybe a more focused section with examples on pagination using RealmCollections could help clear this up...for new developers evaluating Realm. There's only so much time to learn new tech...and make sense of it in line with what one is currently used to. Like these:

Realm uses standard types to represent groups of objects:

1. IQueryable<T>, a class representing objects retrieved from queries.
2. IList<T>, a class representing to-many relationships in models.

At runtime, both conform to the IRealmCollection<T> interface which in turn conforms to IReadOnlyList<T> and INotifyCollectionChanged and supports lazy enumeration - the size of the collection is known but objects are only loaded into memory as you iterate the collection. 

Realm's approach to lazy loading is a complete departure from what most developers are used to. It takes getting used to believe me, an IQueryable instance that does not support Take or Skip, but supports lazy loading in any order! Mind you the case for Realm's lazy loading supporting live updates is not exactly a unique feature, as you can just implement INotifyPropertyChanged (or use Fody) and there you go.

I can bet you as more developers evaluate Realm, this issue is going to come up again and again. Explanation by clear and well illustrated examples do the trick, rather than having users figure it out themselves.

Lazy loading a la Realm is cool, but needs clear examples to avoid too much head scratching. Ever heard of the MoreLINQ library? Wait for it, developers are going to be experimenting soon.

Thanks all the same.

nirinchev commented 6 years ago

You're right - we can be a bit more explicit about good databinding practices with Realm. While not an incredibly thorough one, I wrote a blog post some time ago on databinding that touches briefly on using Realm collections for driving listviews in Forms. I'd recommend checking it out as it demonstrates a few of the basics when using Realm to power your UI.

ozzioma commented 6 years ago

For more clarity on the use cases most developers are familiar with. Check out these pages by DevExpress and Syncfusion on how pagination is implemented in their datagrid controls for Xamarin Forms

https://documentation.devexpress.com/Xamarin/12293/Grid/Examples/How-to-Implement-Load-More

https://help.syncfusion.com/xamarin/sfdatagrid/load-more

IMO, if Realm totally obviates the need for all that logic, most new users are going to need to need to figure out how...on the first try. Cue maintaining current search/filters parameters while paging through the dataset...and you understand why clear examples like the above links are needed.

nirinchev commented 6 years ago

These LoadMore examples make sense when you're populating the collection data from a service and it will take some time to load this data. Realm does obviate the need for that as you can pass in the query as a datasource and have the grids just visualize everything. As mentioned, virtualization and lazy loading makes this possible with no performance degradation.

fadulalla commented 5 years ago

Just came across this issue as I was trying to find a solution to paginating realm objects properly. I think pagination is a valid requirement; and in my case "supplying the entire dataset as a query because most pagination uses are for performance concerns" is simply invalid.

I have a simple Page that represents a thread. It has an opening post/title, and a list of comments. The comments have two levels of nesting, with only the first level retrieved on page load. Users then have the option to load the second level for comments they're interested in.

Loading the first level - fine, pagination not required as Realm handles it well. However, I can't load the entire list of the second level of comments at once for obvious reasons. So I must paginate them along with a "load more" option.

Maybe this use-case is an exception, but it does exemplify that not all pagination requirements stem from performance or loading time concerns. This one is about achieving a specific UX requirement.

Cheers.

luccasclezar commented 5 years ago

Hey @nirinchev, I'm confused with one use-case of pagination:

I want to implement a global leaderboard with Realm Cloud Platform, for instance, and I have one million users. The lazy-loading would happen locally, not server-side, so to show the first 10 records I would need to download to the client's device all one million, is that it?

And if this is the case, I understand there's a Limit method on all other platforms except dotnet, is there an ETA for this kind of common implementation that needs to order thousands of rows and show a few of them from a server?

nirinchev commented 5 years ago

Are you using full sync or query-based sync? If it's full sync, you'll download the entire dataset yes. If using QBS, you can create a subscription with a limit (e.g. give me the first 10 users sorted by score). To do that, you need to use the .Filter API as the LINQ provider doesn't support limiting the result set yet. A sample query would be var query = realm.All<User>().Filter("TRUEPREDICATE SORT (Score DESC) LIMIT(10)")

luccasclezar commented 5 years ago

Thank you @nirinchev! I was confused with this issue for quite some time. I suggest to add that snippet to Realm Sync Limiting Subscriptions. Saying that the Limit method is not supported in .NET is a bit misleading in my opinion (it should be at least said that a "workaround" is possible).

And just to make sure, is it possible to use the Filter method chained after other methods? Like realm.All<Record>().Where(record => record.Score > 1000).Filter("LIMIT(10)")?

nirinchev commented 5 years ago

It's possible, but it's not officially supported. It will work well for most queries, but there may be issues, especially with ordered queries. That's why it's recommended to use one or the other, but not both.

luccasclezar commented 5 years ago

I just checked the documentation again and it's already changed! Thank you very much for the attention you give to your users!

cmcnicholas commented 4 years ago

Things are still a bit abstract here - it's unclear to me which controls you're talking about that require pagination callbacks and can't accept a datasource that represents the entire dataset. I'm unfamiliar with the Android API, but UITableView, UICollectionView, and the Xamarin.Forms collection controls don't inherently require pagination. When you think about it, designing your app around naive pagination (with skip/take) would result in a less-than optimal UX. Consider what happens if someone inserts an object in the background at a position the user has already scrolled past - a naive pagination implementation would result in the same item appearing twice at the end of the current and the start of the new page. Realm provides comprehensive notification API to build a reactive user experience and pagination goes contrary to that goal.

Regarding your second point about calling REST API or something with a paginated dataset - while it's odd design in my opinion to do that, and again, prone to errors (what happens if an item is inserted between requests - will that never make it to the backend?), you can easily implement custom pagination like:

public static IEnumerable<T> Paginate<T>(this IQueryable<T> query, int skip, int take)
{
    var collection = query as IRealmCollection<T>;
    for (var i = skip; i < skip + take && i < collection.Count; i++)
    {
        yield return collection[i];
    }
}

Finally, it' not clear what you mean in this paragraph, so if my reply doesn't address that already, please clarify with some specific examples of what you're trying to achieve:

In the case of IQueryable, the documentation makes it clear that Skip, Take are not supported. So the option of doing a ToList() just increases the confusion, since the databound control still has to call Take and Skip on something. How do you track just how many rows have been loaded as the user scrolls down or up? It doesn't matter??

What databinding framework are you using that uses take and skip? Generally, you should avoid materializing Realm collection by calling .ToList as that will remove the INotifyCollectionChanged implementation and will prevent your UI from updating automatically when items are added/removed.

sorry to dig this up, I have been implementing some library support for our project (i need to abstract away from realm) and a requirement has been pagination support, the example given works fine with no order by but OrderBy returns IOrderedEnumerable<T>, your as statement evaluates to null. Has something changed in realm.net under the hood so this no longer works? I can just call the provided Linq Count() function but enumerables usually end up being more expensive to count than collections or is this still going to resolve to the same count operation under the hood in realm (thus being very quick)?


Apologies above is working fine, we're using a query builder and the order by value was being produced as Func<RealmDbModel, string?> rather than Expression<Func<RealmDbModel, string?>> thus resulting in an overloaded OrderBy to be called.