Closed benbjohnson closed 1 year ago
Update: It looks like the FUSE layer is invalidating the whole database file from cache when it applies an LTX file so a SELECT COUNT(*)
query ends up having to re-read the entire file from litefs
which is slow.
I'm still working on a small, reproducible script. LiteFS was missing the fuse.OpenKeepCache
on Create()
and Open()
although this issue still persists after adding that flag in.
@tv42 Do you have any ideas off the top of your head why this might be occurring? I'm seeing InvalidateNode
calls in the debug output like this:
74FD7445AF470FB062707E8A [r]: => InvalidateNode 0x2 Off:0 Size:4096
74FD7445AF470FB062707E8A [r]: => InvalidateNode 0x2 Off:497082368 Size:4096
74FD7445AF470FB062707E8A [r]: => InvalidateNode 0x2 Off:497086464 Size:4096
Node 0x2
is the database file. I'm also invalidating a different file (the shared memory, or SHM file) with the node id of 0x4
:
74FD7445AF470FB062707E8A [r]: => InvalidateNode 0x4 Off:0 Size:-1
I wouldn't think that would affect the OS page cache of node 0x2
though.
Does the file change size? That currently triggers data invalidation:
if (oldsize != attr->size) {
truncate_pagecache(inode, attr->size);
if (!fc->explicit_inval_data)
inval = true;
That explicit_inval_data
is set via FUSE_EXPLICIT_INVAL_DATA
which the library does not yet have support for (it's protocol v7.30, we're still at 7.17).
@tv42 Ah, that would probably be it. It's a script that just inserts data so the database would keep growing and change size.
How difficult is it to support FUSE_EXPLICIT_INVAL_DATA
? I was looking at fuse_kernel.h
and it shows
FUSE_WRITEBACK_CACHE
at 7.23 but I think bazil.org/fuse
supports that, right? Is support for FUSE_EXPLICIT_INVAL_DATA
something that could be added piecemeal or does everything need to be supported between 7.17 and 7.30?
I'll look into it.
Status update: I'm at FUSE protocol v7.19 now, FUSE_EXPLICIT_INVAL_DATA is in v7.30, many of the changes in between tell the kernel to send new kinds of messages toward userspace so I can't just pick and choose and "skip the queue". (I might be able to choose to not handle them, but I want to understand the consequences and decide that case by case.)
FUSE_WRITEBACK_CACHE
is not supported at this time, we're in writethrough mode by default. In writeback, writes are sent to the FUSE server only lazily. See https://www.kernel.org/doc/Documentation/filesystems/fuse-io.rst
@tv42 Thanks for researching that, Tv. It'll eventually be a higher priority issue for us but we have a few other things to get done on LiteFS first.
For anyone reading this in the future, this mainly affects databases that:
INSERT
commandsFixed! See MountOption fuse.ExplicitInvalidateData in https://github.com/bazil/fuse/commit/b2cd994c4fa7b3c1c9819cd139a8a46d7af2e175
Caveat: I haven't tested it yet.
🎉 🙌 🎉 🙌 🎉 🙌 🎉 🙌
@tv42 I'll give a try this week. Thank you!
When running the repro program from @dangra in this comment, I'm seeing multi-second latency for read queries on the replica.
Update: The read in the repro program fetches every page in the database. With the current FUSE library, a change in the file size will invalidate the whole OS page cache which means that all pages must be refetched which is slow.