eliminate file duplication during a local untrusted pull

ramcq commented 6 years ago

As mentioned in https://github.com/ostreedev/ostree/issues/1720#issuecomment-421061941, we have a problem in Endless OS using Flatpak apps for some pretty large apps. During an install/upgrade of an app, Flatpak runs the networking code and does the pull as the user (for various good reasons, proxies, auth, etc) into a child repo in /var/tmp, and then uses the priveleged system helper to do a local filesystem pull from the user-owner child repo into the real one.

Because a malicious user could keep FDs open and modify the child repo, this has to be an untrusted pull where every file is copied and verified against the configured GPG key of the system remote. On btrfs this uses reflinks, but everywhere else the consequence of this double pull is that twice the disk space is required during an app install or upgrade.

We've got some 5-6GB apps and some systems which ship with eg 32GB MMCs, so this is a very real problem for us - so we're having to use heuristics to disable automatic upgrades when we are low on disk space - but we'd like to come up with a way to mitigate or eliminate this duplication entirely. As far as I'm aware there are no current kernel APIs to revoke an FD or seal a file.

Some approaches we've thought about:

We thought about running the networking code as a depriveleged and/or sandboxed system user which can only be accessed by the flatpak system helper - as we have bwrap etc - but this doesn't allow normal access to the user's desktop config (proxies, env vars, etc).
Implement a "destructive pull" which unlinked each source file as it went. This one seems not too bad although we would probably need some help mapping out exactly how/where to do it. It has the downside that we're not able to entirely eliminate the duplication, merely reduce it to per-file rather than the whole ref. Still, 1GB is a lot better than 6GB.
"Stealing" the file somehow - mv to a root-owned temp folder, chown to root:root, scanning /proc to look for open fds to the file, then proceeding.
A variant of the above which used the immutable flag to ensure that after verification, no changes could be made...? The problem is that we wouldn't know when we'd be able to clear that immutable flag, potentially not until a reboot.
Ask the system helper to create the user child for us, and it does so by creating an unaccessible folder and then bind mounting it so that it's accessible to the user to do the pull. It could then unmount the user bind mount to enforce that no FDs are open, before proceeding to verify and hardlink in-place. (Kudos to @wjt for this suggestion!)

/cc @uajain @wjt @pwithnall @mwleeds

wjt commented 6 years ago

It could then unmount the user bind mount to enforce that no FDs are open, before proceeding to verify and hardlink in-place.

The problem I ran into here is that if there is an FD open there, the bind mount will just be leaked, because I couldn't see a way to forcibly unmount it (and break any open FDs). Presumably providing a way for unprivileged users to create unbounded numbers of bind mounts would be bad :)

ramcq commented 6 years ago

It could then unmount the user bind mount to enforce that no FDs are open, before proceeding to verify and hardlink in-place.

The problem I ran into here is that if there is an FD open there, the bind mount will just be leaked, and I couldn't see a way to forcibly unmount it, breaking any open FDs. Presumably providing a way for unprivileged users to create unbounded numbers of bind mounts would be bad :)

Could the helper not enforce a limit per user? It can stay running as long as there are any mounts, or use a predictable path to re-use the same mount point between invocations, and try to clean/unmount/remount when preparing the operation, and fail if you /still/ have FDs open at that point.

wjt commented 6 years ago

(On Endless we have the additional quirk that /var/tmp is on the /var bind-mount whereas /var/lib/flatpak is a symlink into the /sysroot bind-mount (so that the ostree and flatpak repos can be shared). It's the same physical filesystem but you can't hardlink across a bind mount. But this could be addressed by arranging for the temporary directory to be below /var/lib/flatpak/repo/tmp which will always be on the same mount as /var/lib/flatpak[/repo].)

pwithnall commented 6 years ago

* "Stealing" the file somehow - mv to a root-owned temp folder, chown to root:root, scanning /proc to look for open fds to the file, then proceeding.

I think smcv shot this one down by pointing out that a malicious process could race to re-open a file from /proc/*/fd/$fd after the scanning process had passed over that $fd, essentially making it impossible to guarantee that a process can’t keep an FD around after you move the original file to an inaccessible path.

tl;dr: We couldn’t think of a way to make this particular option secure.

alexlarsson commented 6 years ago

Yeah, the bind mount is not gonna work. Maybe we can create a fuse filesystem that is a pure forwarding system + revoke.

pwithnall commented 6 years ago

Could the helper not enforce a limit per user?

Sounds reasonable to me. The limit could even be 1.

I guess one potential downside there is that it means a malicious app could DoS the user’s access to updates for other apps, which provides a nice avenue for a malicious app to concrete itself in place. If the user were made aware of the stuck-mount issue, they might be able to deal with the issue (or, more likely, not understand what’s going on).

If a bind mount is stuck, we could fall back to the current code path of using double the disk space. That would avoid the DoS issue.

alexlarsson commented 6 years ago

Then the system helper could mount that, and revoke before applying.

Hmm, actually, that will probably not work, because fuse file systems are only accessible to one uid.

pwithnall commented 6 years ago

Yeah, the bind mount is not gonna work.

Because…?

alexlarsson commented 6 years ago

@pwithnall Its possible to have a host namespace unmount "succeed", yet the filesystem could still be mounted in a different namespace.

cgwalters commented 6 years ago

On btrfs this uses reflinks

Random side note, multiple kernel filesystems now support reflinks/copy_file_range() - most notably XFS, but also including NFS.

a malicious process could race to re-open a file from /proc/*/fd/$fd after the scanning process had passed over that $fd

If the system helper is running as root, an unprivileged process isn't going to be able to open its /proc/<pid> dir.

ramcq commented 6 years ago

On btrfs this uses reflinks

Random side note, multiple kernel filesystems now support reflinks/copy_file_range() - most notably XFS, but also including NFS.

Selfishly also very concerned about ext4... :)

a malicious process could race to re-open a file from /proc/*/fd/$fd after the scanning process had passed over that $fd

If the system helper is running as root, an unprivileged process isn't going to be able to open its /proc/<pid> dir.

Ie there is no attack from the unpriveleged process being able to re-open anything provided it's first moved into a path inaccessible to anyone but root? So "steal the entire repo, scan for open FDs, ..." option might not be so absurd?

BTW see also https://marc.info/?l=linux-xfs&m=150151699914687&w=2

How was that received? :)

ramcq commented 6 years ago

Then the system helper could mount that, and revoke before applying.

Hmm, actually, that will probably not work, because fuse file systems are only accessible to one uid.

Does this apply to root as well? Stuff like ntfs-3g runs whole "normal" filesystems as FUSE. Are they all tied to one UID?

alexlarsson commented 6 years ago

@ramcq I tried bindfs, and it did allow a root-mounted fuse fs to be accessible by the user, so clearly there is some way to do this. For further security we could run the fuse fs as some special (non-root) uid, exposing it only to the right user (and then getting rid of the fuser mount, or making it forced readonly) when revoking.

cgwalters commented 6 years ago

How was that received? :)

Well, the most recent update is https://marc.info/?l=linux-kernel&m=153515073024956&w=2 - so far the fsverity proposal only denies truncate which I don't think is a good idea. I just followed up.

ramcq commented 6 years ago

There was some follow-up discussion on #flatpak about using a FUSE filesystem as a way to basically make a revokable bind mount. (Although use of FUSE is hardly unprecedented in the OSTree stack, it seems sad to me that the only way we have to achieve this is by introducing another process with all of the complexity, fragility and overhead.)

@alexlarsson found http://bindfs.org which in terms of exporting a root owned dir so that a user can write to it, and then ungracefully kill the mount. It seemed to work in his quick test, so rather than recurse into writing a new FUSE filesystem, I think we can prototype something in Flatpak using this. Then with a bit more practical experience of it, decide if it's acceptable to bring in as an optional external dependency, bundle or rewrite as needed.

The steps to me seem like:

add a method to the system helper to start a pull, which creates a temp folder under repo/tmp and returns it to the user, keep a list of these
keep the helper process alive if the list is non empty
take them off the list if the dir is given back to us for a pull
check the UID of the requesting user and implement a limit of the number open per user
implement a timeout after which the pull is killed anyway (previously, tmpfiles would come knocking after 30 days...)
after all that set up, we can change the requesting method to do a bindfs instead, keep the process id in the structure, and the apply method to kill it and proceed
profit!

Thoughts? I guess that makes this more of a Flatpak ticket than an OSTree one at this point.

uajain commented 6 years ago

This issue is being resolved as a part of flatpak instead of OSTree. OSTree admins @cgwalters @jlebon Can you help me by transferring this issue to flatpak github repo so that I can link that in my commit messages?

Transferring instructions are here.

Thank you.

cgwalters commented 6 years ago

To transfer an open issue to another repository, you must have admin permissions on the repository the issue is in and the repository you're transferring the issue to.

Is not true for me; I am not sure if there is currently such a person. To help unblock this I have made @pwithnall an Owner in the ostreedev organization.

uajain commented 6 years ago

Is not true for me

Maybe it's in beta that's why? :thinking:

https://blog.github.com/changelog/2018-10-30-issue-transfer/

pwithnall commented 6 years ago

It’s not possible:

If you're transferring an issue from a repository that's owned by an organization you're a member of, you must transfer it to another repository within your organization.

I get options only to transfer the issue to other repositories within the ostreedev project.

ostreedev / ostree

eliminate file duplication during a local untrusted pull #1723