nextcloud / server

☁️ Nextcloud server, a safe home for all your data
https://nextcloud.com
GNU Affero General Public License v3.0
27.38k stars 4.07k forks source link

Send file path to imaginary #32886

Closed Leptopoda closed 6 months ago

Leptopoda commented 2 years ago

How to use GitHub

Is your feature request related to a problem? Please describe.

24166 introduced imaginary as a preview provider.

optional perf improvement: if we know a file to be on local non-encrypted storage, could send the file path directly to the docker (when running on same instance)

This was also said as an optional requirement. This could indeed offload even more load from the nextcloud php part (not only on the same machine). In a docker/k8s cluster imaginary could load the image directly from the storage backend (s3, nfs, ...). The machine running the web server (php) wouldn't be occupied as much. Also this would relief quite some load from the network infrastructure and is especially notable when using imaginary for RAW images which can be 25MB per file being sent around for no good reason.

Describe the solution you'd like Send the file path to imaginary to gain some performance. This could be in an optional config option. I think auto detecting stuff (as described in the original quote) is out of the scope and any admin can decide on their own how this would affect the deployment.

Describe alternatives you've considered N/A

Additional context N/A

PVince81 commented 2 years ago

this would only work for the "Local" filesystem mode and also implies that Imaginary has access to the data directory, so permissions will need to be set properly

imaginary is not able to read directly from S3, see the issue that got closed: https://github.com/h2non/imaginary/issues/110

@CarlSchwan FYI

Leptopoda commented 2 years ago

Your're right. Local storage as per imaginary but it doesn't need to be local to the NC host.

Docker for example has multiple storage backends that can share data among machines. The docker nfs or s3 storage backend come to mind.

Yes it'll appear as local storage to the imaginary container but it doesn't need to be for NC (if you plan to use the docker s3 backend you could use the same volume for the nc container but it's possible). So it probably shouldn't be a requirement.

PVince81 commented 2 years ago

if you use s3 as primary object store, it only contains a flat list of files, not a hierarchy, so you probably won't be able to share it with any other system that expects a list of paths as keys

J0WI commented 1 year ago

I don't think that this is possible with imaginary. Nextcloud uses encryption, so accessing files directly would only return garbage to imaginary. Also, you need to check the permissions carefully, so that only permitted users can fetch previews of the files they own.

Leptopoda commented 1 year ago

But not all instances enable encryption or what exactly is your point?

LittleHuba commented 1 year ago

My thoughts on the security concerns raised above:

Attack Scenario

Some attacker could guess paths and send them to Imaginary and get the user data back. For this, he needs two things:

By default, if configured with the mount parameter, Imaginary takes any path and returns the preview. This is bad! An attacker could just use any gap in NC or some other service with access to Imaginary that allows sending and receiving network traffic and thereby get access to user data.

Solution

Imaginary already has a solution for that. The parameter key adds an authorization key that needs to be provided with every request for Imaginary to do its work.

Could an attacker get that key?

  1. The key must be stored in the config of NC. But NC also stores other confidential information there like the password for DB. So arguably, if the key is leaked via the config of NC we have bigger problems than Imaginary.
  2. The attacker could sniff the traffic between NC and Imaginary and thereby get the key. But with most users not using encryption between their microservices that is a problem with the overall concept of outsourcing preview generation.

So this solution seems quite sound from a first look.

Further Hardening

How to get this working

Now that the security concerns got their just discussion, everybody can decide for themselves if they want to live with those risks or not. If yes, here are the steps to implement this:

  1. Modify the Imaginary container
    • Mount your data volume from NC (as read-only)
    • Add the key parameter
    • Add the mount parameter (must point to your datadirectory as configured in NC)
  2. Add the new preview class in file lib/private/Preview/ImaginaryPath.php: See https://gist.github.com/LittleHuba/497ac6042f2b621c4f3a27e69c3c1986
  3. Familiarize yourself with https://github.com/nextcloud/server/commit/9b6a1cc8ae41807e98f9c4155b6241d3f1f7e470 We are interested in the changes to the following files:

    • lib/composer/composer/autoload_classmap.php
    • lib/composer/composer/autoload_static.php
    • lib/private/PreviewManager.php
    • lib/private/Preview/GeneratorHelper.php

    In all these files you will need to mirror the changes and add the logic to integrate the new Preview class. Yeah this is more involved than most users are comfortable with, but if you dig around in source code then you should know what you do and not just blindly follow some other dude giving you a list of steps :wink:

  4. Add the key for the Imaginary API to the NC config (preview_imaginary_key) (Probably you will need https://github.com/nextcloud/server/commit/c71a3065a7285b3d8406ccd9b7c47280669242c2 if you intend to also use OC\\Preview\\Imaginary)
  5. Choose the new preview generator with OC\\Preview\\ImaginaryPath

Limitations

This obviously only works with local unencrypted storage. You could probably adapt this to work with other storages as well but I don't have them in my setup and hence I can't help with that. I'd recommend configuring some fallback preview generators for other storage (OC\\Preview\\Imaginary).

If this garners enough interest I'll spend the time to actually get this merged into NC should the maintainers approve. But before that, let me hear your thoughts!

LittleHuba commented 1 year ago

I created a newer version of this preview generator that increases security by a notch. Instead of giving Imaginary access to the whole user data and only secure preview generation through the -key parameter, this new version further restricts access by an indirection scheme maintained through Nextcloud. It does that by restricting Imaginary to a directory without any files via the -mount parameter. Whenever a user requests a preview through Nextcloud, the preview generator will first create a symlink in that directory to this file, and then request the preview for this symlink. Once the preview is generated, the symlink is cleaned up again.

What exactly does this solve:

  1. Even knowing the -key parameter, Imaginary will not find files, except for when a preview generation for the same image was requested by the user and is still ongoing
  2. Requests to Imaginary do no longer contain the file path within the user data structure. That removes the chance to sniff metadata from the requests to Imaginary

For this preview generator to work, you need to:

  1. Give Imaginary access to the (docker) volumes that will contain the directory for the symlinks as well as the actual user data (read-only access is sufficient)
  2. Add the key preview_imaginary_tmp_path in your Nextcloud-Config. This key must point to the directory for the symlinks (this directory must exist)
  3. Set the -mount parameter to the directory that will contain the symlinks
  4. Optionally, use the -key parameter to restrict access to Imaginary

You can find the preview generator in the following Gist: https://gist.github.com/LittleHuba/5ddc39f948968bfc3e3c48a031f5b9b3 See above comment for instructions how to integrate it with Nextcloud.

Limitations

  1. This only works with local unencrypted storage
  2. Imaginary must still have access to all user data, the symlink is just a link to the actual image that Imaginary still must be able to read. Users will just not be able to request Imaginary to read images without the symlink being there.

I'd recommend configuring some fallback preview generators for other storage (for example OC\\Preview\\Imaginary).

joshtrichards commented 7 months ago

See recent discussion and performance testing in #44708

Leptopoda commented 6 months ago

Thank you for the information. I don't have the time to do some testing myself so I have to trust the available data.

Closing this issue for now and might reopen it if I had time for testing and the results are different enough.