scylladb / scylladb

NoSQL data store using the seastar framework, compatible with Apache Cassandra
http://scylladb.com
GNU Affero General Public License v3.0
13.43k stars 1.27k forks source link

Range scans pollute cache without benefit #3837

Open avikivity opened 6 years ago

avikivity commented 6 years ago

Range scans that are part of a full table scan are likely to miss cache, and to have new data not reused.

We should bypass the cache for range scans, at least if the table size is greater then cache size. Small tables may still benefit from the cache.

avikivity commented 6 years ago

/cc @tgrabiec @denesb

tgrabiec commented 6 years ago

Refs https://github.com/scylladb/scylla/issues/3643

denesb commented 6 years ago

The tricky part would be determining when to bypass the cache. Some users might want to use a full scan as a mean to prefill the cache.

avikivity commented 6 years ago

@denesb if the table is larger then cache, then the next full scan will miss.

It's possible for a workload to do a fragement multi-pass scan: read(0, 10) read(0, 10) read(0, 10) read(10, 20) read(10, 20) read(10, 20) read(20, 30)... . Such workloads can utilize the cache if they are carefully tuned. However, I believe that the overwhelming majority of range scans will be part of a single-pass full scan.

avikivity commented 6 years ago

Refs #3643

Yes, and even more simplistic than that.

avikivity commented 5 years ago

User-controlled cache bypass: 2a371c2689d327a10f5888f39414ec66efedb093. Perhaps it is sufficient.