h2non / bimg

Go package for fast high-level image processing powered by libvips C library
https://pkg.go.dev/github.com/h2non/bimg?tab=doc
MIT License
2.71k stars 338 forks source link

Vips warnings and image artifacts at moderate concurrency #137

Open willtrking opened 7 years ago

willtrking commented 7 years ago

Hey there, been running into issues while working on a image server that uses bimg to perform jpeg -> webp conversion and scaling

At even moderate concurrency (20+ concurrent requests) in artificial tests, I've been getting tons of vips warning: VipsJpeg in my output, causing significant amounts of requests (50%+) to fail. If I don't reject images where bimg.Resize throws an error, I see significant artifacts in browser while tests are running.

The conversion funnel is simple, downloading an image from s3 (confirmed that warnings are NOT caused by incomplete downloads) and then converting the byte slices as follows

finalBuf, finalBufErr := bimg.Resize(respBuf, bimg.Options{
    Type:        bimg.WEBP,
    Width: 750,
    Quality:     40,
    Compression: 6,
})

I've also tried a simple jpeg scaling without webp conversion, and run into the same issue, code for that is simply

finalBuf, finalBufErr := bimg.Resize(respBuf, bimg.Options{
    Width: 750,
})

That converter is wrapped up in a simple fasthttp handler, no crazy logic yet as project is in early stages and I'm just testing out different image converters/resizers.

Server is running on an EC2 c4.8xlarge, so 36 "cores" and 60 GB of memory.

Should also be noted that this happens regardless of what I set runtime.GOMAXPROCS to, and I'm not fiddling with any of the vips concurrency settings.

Should also be noted that https://github.com/DAddYE/vips does not run into any issues, although is definitely slower (although hard to judge exactly given the error rate)

Love the library overall, been using it extensively in other areas, but the issues I see here make it difficult for me to use for this project.

EDIT:

Vips version info: vips-8.4.5-Fri Mar 10 07:17:47 UTC 2017

uname -a output: Linux ip-10-0-1-249 4.4.41-36.55.amzn1.x86_64 #1 SMP Wed Jan 18 01:03:26 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

EDIT EDIT:

Did more testing, discovered issue actually arises reliably at 20 concurrent requests (originally tested 50-100)

phpb-com commented 7 years ago

Have you tried setting VIPS_CONCURRENCY environment variable to 50 or so? Not sure if it will help but worth trying.

willtrking commented 7 years ago

Tried it out, setting VIPS_CONCURRENCY to anything other then 1 produced the following message in addition to the original warning:

VipsRegion: valid clipped to nothing

h2non commented 7 years ago

Unfortunately, this is something I've experienced in the past too. I'm still not sure the root of the issue (but I didn't dig into details yet). Any help here is always welcome!

touhonoob commented 7 years ago

Hi @willtrking Could you provide a sample application which can reproduce the issue? warning: VipsJpeg might not be the root cause of failed requests according to some discussions about libvips.

jcupitt commented 7 years ago

Hello, this should be fixed by https://github.com/jcupitt/libvips/issues/639 , which is released as libvips 8.5.4. I'm sorry this has been causing problems. There's a new thing in the test suite which hammers on exactly this error, so it shouldn't be able to regress again.

There was some stuff about this in the release notes for 8.5:

https://jcupitt.github.io/libvips/2017/03/16/What's-new-in-8.5.html#new-sequential-mode

There have been a couple of small problems with the new system, but it does seem to be working now, and it's a good improvement over the old mechanism.

willtrking commented 7 years ago

Hmm, 8.5.4 doesn't seem to fix it for me. I'm using the tarball release from https://github.com/jcupitt/libvips/releases. Still getting the following logs, as well as image artifacts.

(bimg:39): VIPS-WARNING **: 
(bimg:39): VIPS-WARNING **: read gave 68 warnings

When I serialize bimg (vips) with a simple mutex, there are no issues.

EDIT: Sorry about the close/reopen. Hit the wrong button

kirillDanshin commented 7 years ago

hmm. I think I know how to fix it. pending analysis.

jcupitt commented 7 years ago

@willtrking oh dear, then I guess it's a problem in the bimg layer. Could it be auto-rotate? You need to render via a memory buffer for this.

There's a page on how to resize with libvips:

https://github.com/jcupitt/libvips/wiki/HOWTO----Image-shrinking

for reference.

willtrking commented 7 years ago

@jcupitt Sorry about delay here, had to work on other projects for work so this took a back seat. Not doing any autorotate stuff (to my knowledge), to be specific I'm always running a crop/resize of a jpeg (the end width/height does vary, although the originating width/height is always within a few pixels of the same value) and converting the output to a lossy webp.

Turned out the mutex was probably hurting more then it was helping, so I've removed it.

Running now with libvips 8.5.5 , I've managed to tame down the amount of warnings I'm getting by disabling orc and setting VIPS_CONCURRENCY to my CPU count (in my case 8, I'm running a couple of AWS m4.2xlarge instances). Within the context of bimg I'm doing the following once when my server starts:

os.Setenv("VIPS_CONCURRENCY", strconv.Itoa(runtime.NumCPU()))
bimg.Initialize()

Adding os.Setenv("VIPS_TRACE", "true") seems to properly start tracing, so I'm assuming VIPS_CONCURRENCY is also properly being set.

I'm still getting the following warnings, although I haven't seen any artifacts:

(bimg:39): VIPS-WARNING **: read gave 10 warnings

(bimg:39): VIPS-WARNING **: 

(bimg:39): VIPS-WARNING **: read gave 2 warnings

(bimg:39): VIPS-WARNING **: 

When the server produces warnings as above, there's no guarantee that artifacts will be produced.

One thing worth mentioning is that I was incorrect in my initial issue, https://github.com/DAddYE/vips does seem to produce the same artifacts / warnings, although reproduction may be under different levels of concurrency from bimg.

Very willing to help contribute to a fix if indeed bimg is the culprit, although I'm at a bit of a loss in terms of starting points. Also if it helps I can send a VIPS_TRACE output. I think one of the most beneficial things would be to have an option that allows bimg to return an error if Resize causes a vips warning, which if there's interest I'd be willing to contribute toward. This way any warnings could either fail a request, or retry the Resize

jcupitt commented 7 years ago

Hi again, libvips is being used at very high levels of concurrency without issue in several other projects, so while it could be a libvips issue I suspect it is probably something about the way bimg is using it. A retry on warning option seems to be ducking the problem: it would be better to fix it.

I'm afraid I've never used Go so I'm not clear about setting up a dev environment. If you could write a very brief idiot's guide for ubuntu and point me at a test-case for this issue, I'll have a go at reading the bimg source and looking for problems.

vansante commented 4 years ago

Did anyone manage to resolve this? I'm getting image artifacts randomly on images I am resizing whenever I generate more than 1 at the same time. It seems to happen more often for high resolution large image.

jcupitt commented 4 years ago

8.9.1 has a couple of races fixed, it'd be worth trying that if you're not on it.

vansante commented 4 years ago

I just tried, but it didn't help unfortunately.

There seems to be a cutoff point after which the image is corrupted, for instance: image

It seems to be happening for images that have autorotated based on EXIF information. The portait images display the problem, but I never see it happening for landscape pictures.

jcupitt commented 4 years ago

I had a quick look at the bimg source. This seems to be the line handling auto-rotate:

https://github.com/h2non/bimg/blob/master/resizer.go#L38

It's doing it at the start of processing, which is probably not optimal, I think. My suggestion would be to call the standard libvips vips_thumbnail operation instead, and add gamma and watermark handling after that.

vansante commented 4 years ago

You seem to be right! I just created this MR: https://github.com/h2non/bimg/pull/324

And that seems to fix the issue (at first sight), except the portrait images now seem to lose (too much) resolution and end up fuzzy, so I'm doing something wrong.

jcupitt commented 4 years ago

vips_thumbnail_image() can't exploit shrink on load (since the image has already been loaded). If you can, use vips_thumbnail_buffer() instead, it should be much quicker.

urjitbhatia commented 4 years ago

@vansante hi, we've also seen this before - can you share how you're testing? I tried running with large concurrency and I am getting correct images (with 8.9.1). I'd like to double check

thanks!

vansante commented 4 years ago

@urjitbhatia I basically had a large set (~50) of JPEG images of high resolution (something like 60 megapixel) which I unfortunately cannot share because they're from a client and thus confidential. I then had bimg generate 300x100px size thumbnails in something like 4 goroutines, after which in that set of 50 images there were always 1-5 images with corruption.

I solved it by now running on my own fork which does as @jcupitt suggested, which is to only execute vips_thumbnail_buffer() on an image. This unfortunately disables most of the rest of bimg's featureset, making it unlikely my MR will be merged, so I hope someone can mold it into something usable. Otherwise I might end up creating a smaller library for libvips that only does the thumbnailing with vips_thumbnail_buffer().

urjitbhatia commented 4 years ago

@vansante thanks! that helps me test it better. I'll take a look at your MR and see if I can help thread it through the rest of the pipeline.