nextcloud / server

☁️ Nextcloud server, a safe home for all your data
https://nextcloud.com
GNU Affero General Public License v3.0
27.39k stars 4.07k forks source link

Thumbnail/Preview image generation is too slow #1732

Closed iskradelta closed 2 years ago

iskradelta commented 8 years ago

Steps to reproduce

  1. Use Nextcloud android app, or the web interface to view a folder list containing many JPEG pictures taken by a camera
  2. Notice the slowness in loading previews, about a second for each picture, web interface is faster since it sends more requests at same time
  3. Notice in the logs that it takes about 1.1second to scale down an image of about 5MB to 384x384.

    Expected behaviour

The previews should be generated in less than 0.3seconds, preferrably faster. It should be possible to pre-generate the previews/thumbnails for recently uploaded photos or by running a cron job. To avoid waiting for any previews. No external storage or weird things is used.

Actual behaviour

Slow as a nintento 8bit running openssl dhparam generation.

Because php-gd seems to load the image into memory, then it does imagecopyresampled which takes time on big JPEGs, also it only uses 1 process for the computations.

ImageMagick on the command line also takes everything from 1 to 5s depending on parameters used to scale/resize down the image, but with -define jpeg:size parameter, telling ImageMagick to not load the whole image into memory, but instead request smaller parts of it while resizing it, takes 0.2seconds on average.

Should nextcloud switch to ImageMagick and redo/later-throw-away legacy/image.php ? Perhaps making a NC_Image class which would use imagemagick either by shell_exec or if php has the imagick extension loaded use that. I think so, yes?

There could be a setting in the admin interface "Pre-generate thumbnails for recently uploaded/scanned content?" This setting could just downsize really big content say > 1mb into something small like 384x384 - and if other clients want even smaller image they can resize from the small part again ondemand, but that then would be much faster.

Would a patch for this be accepted?

Server configuration

Operating system: GNU/Linux

Web server: lighttpd

Database: sqlite3

PHP version: PHP 5.6.21, opcache apcu enabled, php-fpm.

Nextcloud version: 10.something.

Updated from an older Nextcloud/ownCloud or fresh install:

Where did you install Nextcloud from: the nextcloud website.

LukasReschke commented 8 years ago

Should nextcloud switch to ImageMagick and redo/later-throw-away legacy/image.php ? Perhaps making a NC_Image class which would use imagemagick either by shell_exec or if php has the imagick extension loaded use that. I think so, yes?

No. We won't use imagemagick as that one isn't really secure to use with untrusted user input.

There could be a setting in the admin interface "Pre-generate thumbnails for recently uploaded/scanned content?" This setting could just downsize really big content say > 1mb into something small like 384x384 - and if other clients want even smaller image they can resize from the small part again ondemand, but that then would be much faster.

Why a setting and not making this default?

cc @rullzer Thoughts?

rullzer commented 8 years ago

We already do this. The first time it is requested we generate a 2048x2048 (default) image. And from this one all smaller thumbnails are derived. We use 2048x2048 since else thing look blurry on your laptop screen.

I'm still working on making our preview system more efficient. And something to create previews in cron is still on the list.

iskradelta commented 8 years ago

@LukasReschke ImageMagick and security: Shouldnt the choice of image-backend at least be a choice for the user? I run Nextcloud on my server and feed it my images and documents - how is any of that untrusted input? Not everyone will run nextcloud as an enterprise renting it out to otters I think. btw I am curious about security of imagemagick, have any pointers? Is there any other faster alternative which is also secure?

@rullzer I see. But it is still too slow, even with 2048x2048 gd imagecopyresample spends 0.8s for every image. Can gd be speeded up? Can it be told to use more threads or processes?

LukasReschke commented 8 years ago

@LukasReschke ImageMagick and security: Shouldnt the choice of image-backend at least be a choice for the user? I run Nextcloud on my server and feed it my images and documents - how is any of that untrusted input?

And that's good for you. But we're aiming for a solution that scales from small to big and doing so with sane defaults. Adding "enable this switch here and it makes stuff faster but also much more insecure" is not an option. You can of course write an optional app that does this, but it won't be part of the core server for now :)

btw I am curious about security of imagemagick, have any pointers?

https://github.com/mkoppanen/imagick#security has some more infos. https://imagetragick.com was a kinda big exploit. We had imagemagick until like 2013 for some previews enabled by default but this allowed an attacker to extract arbitrary files from the system.

The only sane way to run imagemagick is in a sandboxed, secured, environment. Something we don't offer out of the box yet and also would be an additional component. (e.g. a preview server system)

Not saying that we shouldn't speed up thumbnails. We should. But enabling imagemagick on the webserver is a very bad idea from a security PoV.

iskradelta commented 8 years ago

@LukasReschke All valid concerns which I agree with, security should come foremost, then speed.

I dont think the current solution scales even to small use cases, as it is too slow. The server is on 32gb of ram, storage on 2 raid0 ssds, 12core cpu, and yet it stutters when web browsing nextcloud on it, and Android phone (quad core, tested with a s7 as well) on the same WiFi - loads the picture preview very visibily at 1 per second. Doesnt matter if the folder has 10 pictures or 1000, they are previewed serially (especially on Android), slowly. (Yes opcache, apcu, lighttpd set to maximum performance and caching were possible, php-fpm dynamic with many workers)

The expectation is for it to to be as fast as the inbuilt android-gallery, as is the case when using google drive. We can make that happen.

I like the preview-server idea, how would the communication happen between nextcloud/php and preview-server be? Unix sockets and passing file-descriptor? Because the speed advantage is gone the second we try to stream the whole file to any preview-server, when it can do better by just reading parts of it intelligently (skipping rows and columns of pixels). By writing/passing file-descriptors to unix-socket, the other process can live isolated in its own container without access to our filesystem.

Otherwise more simply we could proc_open("/bin/some-server", $descriptors, $yadayada) and let child have the file-descriptor, and prevent that process from accessing nextcloud_data by normal unix-permissions, that preview-process could be setuid a preview_user. Simpler than unix-sockets? And if people want more security - let the some-server talk to a unix socket to another container which is running imagemagick/gd under seccomp mode.

Would such a preview-server be useful for libreoffice as well?

alexander-schranz commented 7 years ago

Why not give the administrator the choice which image processing library they want to use you can recommend the one you want. Get on my first installation Module not found gd and I have imagick and gd2 enabled.

There is a very good abstraction here https://github.com/avalanche123/Imagine which is a very wide used library.

Bugsbane commented 7 years ago

It's probably worth pointing out for anyone reading this later that an app for preview generation does already exist here. Not sure how much this meets your usecase @iskradelta?

Klaas- commented 7 years ago

It seems to me that thumbnails are created for each user individually (nextcloud/data/username/thumbnails), even for shared data (a big photo collection in my case). That looks like a waste of space and cpu time to me. I can combat the first with filesystem deduplication, not sure about the other part.

rullzer commented 7 years ago

Making the preview system more modular is still on the TODO list. That way the problems by @iskradelta @alexander-schranz should be fixed since then you can enable the app you want to generate previews

@Klaas- since NC11 we do share previews. However note that the gallery app is not yet modified. This will come soon (tm) I hope.

Klaas- commented 7 years ago

thanks @rullzer linking your pr for others to find it https://github.com/nextcloud/gallery/pull/187

iskradelta commented 7 years ago

@Bugsban thanks will try the app later, I "fixed" it by writing a short bash script with curl to force the auto-generation of previews.

@Klaas- @rullzer Yes, it did take much more space.

Then I looked into using file-descriptor passing for a more secure preview-system in general, and it would let me decide which process is actually processing a a file, so I could use graphicsmagick instead of php-gd. My tests showed gmagick could do the thumbnails so fast it would be not worth wasting space on pregenerating previews, while php-gd could not - gmagick has an optimization to skip reading the whole file and construct a smaller one, and some other options. For security reasons, fd-passing, since then I could run gmagick in another lxc-container with its own seccomp filters adjusted for gmagick, and the process gmagick in that container would not even have file-system access to anything else on the server, in fact, it would only have access to file-descriptors passed to it over a unix socket which would be mounted in both nextcloud-cointaner and gmagick-container. So if anyone exploits/gets-the EIP in gmagick process say with a bad .jpg all harm it could do is write a bad thumbnail.

AngeloFrangione commented 7 years ago

Yet two days I'm searching for a solution to this "slow previews" and "slow thumbnails" Found this tool you mentioned, but it actually don't generate the thumbnails for the gallery app and, the photo previews are still looking bad and very slow to generate, it takes more time to generate the previews than downloading the full image ! If at least we had an option to disable compressing the previews.

This issue is open more than 3 years, saw topics on forums from 2013 mentioning these problems, and now in 2017 still these slow previews and image generation, certainly because of GD. I can't afford a 80 dollar dedicated server in month for my photos. Isn't any way to change the whole thing ? Even the UI is very slow for me at certain times..

iskradelta commented 7 years ago

@AngeloFrangione lol even a $100 dollar dedicated server wont solve the issue. Mine is on a 6-core i7, 32GB memory, 4 SSDs in RAID0, 1Gbit ethernet, and still... as you say, just viewing the picture is faster than waiting for gd to generate preview. My measurements showed the problem is in gd, and a solution is in using graphicsmagick because it can be told to not read the whole image but only the parts it needs to generate a preview - and that works.

LexiconCode commented 7 years ago

I would like to show my support for GraphicsMagick. Although only in the event that the it's implemented securely.

GM is used to process billions of files at the world's largest photo sites (e.g. Flickr and Etsy).

godfuture commented 7 years ago

This issue is also a pain for me and I am highly interested about the current state of it. I guess the hint of @LexiconCode with GraphicsMagick looks quite promising!

srkunze commented 7 years ago

Same here. I definitely like the gallery view - when all images are available to preview. I also used the Preview Generator App, but it only generates the smallest size for the list view.

So, there are two different ways to solve it but doing both would also be nice:

1) Preview Generator App - pre-generate gallery images as well as thumbnails 2) Speed up gallery images considerably

If there's anything I can do here to help, ping me. 👍

abefroman6868 commented 7 years ago

Same here. The gallery view was the main reason to buy the Nextcloud device and I reported the issue I guess in Dec. last year. The NC forum shows, that it is a concern to many users. Now, I have to open the webbrowser and wait hours(!) until the previews are created, not talking about sharing with others immediately after upload. It managed to create 60 previews in one day on NC11 snap. So I guess the preview Generator app would be a good solution, if it performs well.

Please when can we get it on the snap for Raspi 2?

timtierney commented 7 years ago

Replying to @srkunze

Even of the Preview Generator App handled the gallery I don't believe it would help. Correct me if I'm wrong (not familiar with Nextcloud code). I don't believe (as of right now *) you can multi-thread a PHP script. So if you had a PHP script do generation, only one CPU thread could work on it correct?

Would writing something outside of PHP be better? Maybe Python has some image-processing libraries that would be more efficient and, with multi-threading support, scale better.

* I guess pthreads are a thing in PHP 7.2

srkunze commented 7 years ago

@timtierney Maybe the GIL prevents the multi-threading idea in Python. Why do you think modifying the Generator app could not handle it appropriately? I works well for the Android app using thumbnails.

timtierney commented 7 years ago

@srkunze I think it could do it. But would it scale on machines with more cores/threads?

wiwie commented 6 years ago

@timtierney multi-threading should be possible if a cron job is used for thumbnail generation, right? this would only require, that the occ command for thumbnail generation supports generation of a subset of thumbnails. then the occ command could be invoked multiple times by the cron job.

rubrik commented 6 years ago

Could the thumbnail generator spawn a certain number of workers processes to generate these thumbnails. Thumbnail generation in Dropbox is very much faster then Nextcloud at the moment, it would be nice to get close in terms of speed. Nextcloud is already better in terms of quality.

Hermsi1337 commented 6 years ago

Any news on this? The slow generation of thumbnails is the only thing holding me back from using NextCloud. ):

adriengibrat commented 6 years ago

Installing previewgenerator and adding a cron job helps the user experience most of the time. It's just a workaround as the image generation is still very slow, just pregenerated and cached.

I like the solution proposed by @alexander-schranz, use Imagine as a layer of abstraction, and eventually let the user choose with warning about security (and extra dependencies).

It seems no preview system is secure:

Nextcloud team already made that clear.

LexiconCode commented 6 years ago

Could it be shifted to client side?

Darkvater commented 6 years ago

I have rewritten image.php to use the Imagine interface instead of the raw GD functions. For this to work we need to add an additional dependency to imagine into nextcloud. While there should be no functional changes in my patch I have added a config setting to allow to select the Imagine interface, be it GD, Imagick, Gmagick or Vips in https://github.com/Darkvater/server/commit/d06b2bf9b507b3b39e204baadb0dce33931d7967

@iskradelta do you have the gmagick flag to optimise and do preview on the fly? Would be interesting to find out. I am not sure the Imagine interface by default is optimised for speed. Update: found it, you have to give it a size before opening the image so that it does a "progressive" read. Let's see if I can reproduce this with imagine, by default it's not fast at all.

Mikescops commented 6 years ago

Hey @Darkvater, how do you set up your modification ? What is the config setting ?

Darkvater commented 6 years ago

@Mikescops I have to patch image.php as in the commit but also add the imagine dependency which I am not fully sure about how it works on a master checkout (keeps doing infinite loops with the VCS in 3rdparty/composer.json). So it's definitely not ready yet for a pull.

fvzwieten commented 6 years ago

Hey @Darkvater, is there any progress on making your patch official. I love NextCloud, but this performance thing is the only thing holding me back to migrate my familiy photos onto it.

Darkvater commented 6 years ago

@fvzwieten I don't think the patch will ever make it in as maintainers have previously said they don't want other image generation libraries used as they are insecure. The only reason for imagine is this.

So I think I'm giving up on this path and just use a local patch which now I can improve since I've figured out how to make super fast (think sub-second by only loading a preview of the jpg to scale as @iskradelta found) which will allow me to do it on the fly as well. Prerequisite: need to setup my rock64 sbc to replace the aging raspberry pi1)

srkunze commented 6 years ago

@rullzer Is there an update on this issue?

Also because my girlfriend is somewhat annoyed by this :-(

godfuture commented 6 years ago

Same here. How do Flickr, Facebook and so on preview images? I guess not single threaded...

schniepp commented 6 years ago

I had a similar situation where an 8-core Xeon would run for 2 weeks to have my thumbs/previews created using the previewgenerator app... However, it turned out that in my case the speed bottleneck were (mostly write) accesses to the database. I was able to do some tuning of the MySQL settings to get a significant speed-up, as described here: https://github.com/rullzer/previewgenerator/issues/107. After the optimization, one of the cores was at least used to 50% load, or so. Before the optimization, only one core was loaded with preview generation, at a 10% load level...

BatPio commented 6 years ago

@rullzer Many home users is running NextCloud on RaspberryPi. How about @Darkvater idea? Using Imagine as abstraction layer sounds reasonable. Nextcloud should allow admin choose what is most important for their instance of Nextcloud: speed or security. Home user can use GMagick or Vips to increese speed.

sirkong2 commented 6 years ago

Couldn’t agree more with @BatPio. I’m running Nextcloud on a Raspberry Pi inside my home closed network. No need for security. I need my thumbs! And I need them generated at a reasonable speed.

Mikescops commented 6 years ago

Definitely agree with you guys ! Home users should be able to choose one. Loading thousand of images is for now a real pain...

cantonese84 commented 6 years ago

@Darkvater Could you please share your "local patch" to speed up thumbnail generation?

Darkvater commented 6 years ago

@cantonese84 sorry, it didn't work as expected. The abstraction layer is there, so I can use any imaging library I want, but it is not much faster. The reason for this is that the file needs to be opened in preview mode (eg. specifying a smaller size) to have full advantage of the speedup. The code flow however does not work this way. I will need to make some changes on how these methods work. Let me share what I have and maybe someone else can pick up as I have shelved nextcloud for now.

BatPio commented 6 years ago

@Darkvater Could you describe more accurately what needs to be changed? Please, indicate Imagine calls that should be changed. What Imagine function should be used?

bobobo1618 commented 6 years ago

Making the preview system more modular is still on the TODO list

Is there somewhere we can follow this? Has anyone outlined what it should look like if someone happens to have time to invest to handle it?

Also regarding alternatives, ImageFlow is another. It's still a bit immature but it's designed specifically for user-supplied images and the security concerns that entails.

bobobo1618 commented 6 years ago

Also @Darkvater:

The reason for this is that the file needs to be opened in preview mode (eg. specifying a smaller size) to have full advantage of the speedup.

Do you mean that you should read an existing preview to generate a smaller preview? I don't know the code very well but perhaps you could modify getThumbnail in the image preview code to do that by calling itself to fetch a large preview when the small preview is requested? It should either generate it or read it from the cache.

Or do you mean that Imagine has the ability to open an existing large preview and subsample it or something if you tell it the file size you want output?

I think the solution here might be to fiddle with the image preview code rather than the low level image manipulation code anyhow. It should even be possible to do that as an app (like this, for RAW files) so I'll have a look and see if I can make it work.

BatPio commented 6 years ago

@bobobo1618 I am afraid that IProvider is used only to generate the largest size of the preview. Smaller preview sizes are generated from larger ones by /lib/private/legacy/image.php

bobobo1618 commented 6 years ago

Ah, I didn't see that call in the Generator.

Taking that into account, my proposal is to add two optional methods to the Provider interface:

My rationale for the first method is that we can't just use the existing method because it doesn't support cropping, which is something we'll want to support. If we don't, then we'll have to deal with the performance hit of GD decoding, cropping and re-encoding the larger preview.

My rationale for the second method is that I think there's reason a user/developer might want to generate the preview from the original file. For example a photographer might care a lot about quality and not want to incur the degradation of two JPEG encodes or a RAW preview app developer might want to generate a full size preview by rendering the content of the RAW file but might want to use embedded previews for the smaller versions.

getPreview in Generator can then be altered to check for generatePreviewFromOriginal, use that if it's there else check for generatePreview else, if cropping isn't needed, use getThumbnail and finally, fall back to the existing logic.

Once that's done this should all be modular and an app can do everything the user wants.

master0044657 commented 6 years ago

my nextcloud vm has been generating previews for 15 days non stop and by now has only generated previews for 15k out of 35k images. If this product was designed to scale i must be doing something wrong because i cant imagine someone with 500 thousand images.

vm specs: xeon 1260l 4gig ram

clawoflight commented 6 years ago

Any news/official rationale on this? Honestly, this is a serious dealbreaker.

dgtlmoon commented 4 years ago

Also transmission of the wrong thumbnail size https://github.com/nextcloud/server/issues/13709#issuecomment-554658438

I feel like the whole system is doing a lot of 'assuming' about how people are going to care.

RomainFallet commented 3 years ago

Any news on this?

master0044657 commented 3 years ago

my nextcloud vm has been generating previews for 15 days non stop and by now has only generated previews for 15k out of 35k images. If this product was designed to scale i must be doing something wrong because i cant imagine someone with 500 thousand images.

vm specs: xeon 1260l 4gig ram

Ok, came back almost 3 years later and thumbnail generation is almost in the same state. For me this feature is a must. I think it would be nice to have a simple GUI to generate and keep track of generated images/videos and increase speed generating those thumbnails.

godfuture commented 3 years ago

Ok, came back almost 3 years later and thumbnail generation is almost in the same state. For me this feature is a must.

My assumption is that even the community around nextcloud seems big, the developing community is rather small. And the paid developer focus is enterprise. For them, the community is a playground well suited to test things out and gain additional test ressources. But do they play a role for the road map? I doubt it. Do you think business users collect huge amount of photos in gallery? Right, thats why three years no move here.

LexiconCode commented 3 years ago

There's enough interest where people might want to donate toward sponsoring this issue directly. I'd be willing only if it went towards this issue.