spring-projects / spring-data-cassandra

Provides support to increase developer productivity in Java when using Apache Cassandra. Uses familiar Spring concepts such as a template classes for core API usage and lightweight repository style data access.
https://spring.io/projects/spring-data-cassandra/
Apache License 2.0
376 stars 309 forks source link

Support pagination feature in Cassandra [DATACASS-56] #232

Closed spring-projects-issues closed 6 years ago

spring-projects-issues commented 10 years ago

David Webb opened DATACASS-56 and commented

Determine how this will perform and if its possible with Cassandra


Issue Links:

Referenced from: pull request https://github.com/spring-projects/spring-data-cassandra/pull/114

5 votes, 10 watchers

spring-projects-issues commented 8 years ago

David Webb commented

Results of the investigation are as follows.

Pagination support with Spring Data requires using interfaces that support attributes like "Total Pages" and "Total Records" to divide by the page size to get the total pages.

While this is a valid CQL query in C, the query will certainly bring down an entire C cluster. The more nodes and the larger the dataset in the CF, the harder it will fail.

select count(*) from column_family;

While the CQL engineers are okay with this, the SDC team is not okay with adding a feature that will knowingly negatively impact HA Production systems.

I will leave this open for discussion by the voters. I am open to a good solution, but after thinking about this, and working with large C* clusters for the last 2 years, I do not see a solution that will work.

I believe that in the Big Data realm, the users of this technology must accept that there is a trade-off from the RDBMS realm, and some functions (Pagination) are not reasonable or possible.

Feedback welcome.... :)

spring-projects-issues commented 8 years ago

Jamal Fanaian commented

Hi there! I'm new to this project, but I have been thinking about how pagination could be efficiently implemented in SDC and have some thoughts to share.

I agree that the standard SD Page/Slice interfaces will not work with Cassandra. They assume offset based pagination which is not supported in Cassandra. I think you hinted at this in another ticket related to pagination, but exposing a continuation/range based interface would solve this problem. Instead of exposing an offset, this interface would expose a continuation value and page size. We could then use the continuation value to do a range query to fetch results after the continuation. In CQL, the query could look something like this:

SELECT ... FROM table WHERE token(pk) > token(continuationValue) LIMIT pageSize;

In addition, such an interface could be used with other SD adapters as well. Even when working with a RDMBS, offset based pagination is not efficient when dealing with large data sets. So, an implementation in SD would make more sense.

I took on the task of implementing a proof-of-concept that I could share here (forked in GH, links below). The implementation is not complete or clean, and assumes a couple of things that I'd like some feedback on. But, it works!

A couple of assumptions I have made:

I have made changes to both spring-data-commons and spring-data-cassandra to add support for this:

And I have a working example project that shows the usage as well:

One caveat is that the existing PagingAndSorting repository would not support this. Currently, I'm only using it by defining a custom method in my Repository, but it would make sense to implement a ContinuationRepository.

If this implementation makes sense, I would love to finish this up and submit a PR so any feedback on that would be appreciated.

Thanks!

spring-projects-issues commented 8 years ago

Mark Paluch commented

Hi Jamal Fanaian,

That's awesome. I see two options for paging. Stateful and stateless paging. Stateful paging is the way you approached. Users would reuse the cursor returned by the Cassandra query (Continuable from your example) to obtain the next page. Stateful paging works only if we know the last value that was retrieved from the SELECT, so it's not feasible for streaming queries.

Stateless paging is the other approach, but it would be more expensive than stateful paging. Stateless paging could work with the existing Pageable objects and perform the skip of records by just iterating over the returned elements without converting those rows into entities.

I'd leave the decision up to the user with a clear statement of how it works and what to expect from the API. We use Pageable inside of Spring Data REST so it would be a nice feature to expose Spring Data Cassandra Repositories with paging support.

Does this make sense?

Any thoughts Oliver Drotbohm, John Blum?

spring-projects-issues commented 8 years ago

Jamal Fanaian commented

Hi Mark Paluch,

Thank you for the feed back! I can see the appeal of providing support for Pageable using stateless pagination. It is definitely convenient if you can control the size of your result set. But, I'm afraid a lot of users may naively implement this since it's the standard for many of the other Spring Data adapters without understanding the costs. If the consensus is to support this approach, then I'd be happy to implement it with my current change set. I did have one question in regards to this, though. If using Pageable, are you also expecting to return Page? What is your plan in regards to Page#getTotalElements() and Page#getTotalPages()? Or, should SDC only support returning Slice?

In regards to stateful pagination, my plan is to add Continuable support to Spring Data REST. In my previous comment I mentioned that I would want to return a serialized and encoded version of the continuation value. The idea was that these would be generated when building the response and provided under the _links key, similarly to how Pageable is handled. My current thought was to do something such as:

http://api.example.com/v1/path?next=foo
http://api.example.com/v1/path?previous=bar

And, when those values were present, provide a ContinuationRequest that can be used in a request endpoint. I wanted to provide an example of where I was headed to get some feedback before I continued further with this implementation.

Thanks for the feedback so far! I'm looking forward to hearing more, and hope that this can lead to something that could eventually be used :)

spring-projects-issues commented 6 years ago

John Blum commented

PR #114 reviewed, polished and merged to master for the Spring Data *Kay GA release

spring-projects-issues commented 6 years ago

Łukasz Gosiewski commented

Hi all.

Have anyone thought about convenient way of connecting this solution with serialization/deserialization of return Pageable? This is needed to expose this as REST method for fetching. Native PagingState can be easily serialized with it's .toString() or .toBytes() methods, but it looks like CassandraPageRequest contains some additional pieces of information and can't be serialized as easily

spring-projects-issues commented 6 years ago

Mark Paluch commented

LukaszGosiewski care to file a new ticket in Spring Data REST?

spring-projects-issues commented 6 years ago

Mark Paluch commented

Update to my previous comment: Care to file a new ticket in Spring Data Cassandra as we need to provide a serialization mechanism in the first place