processwire / processwire-requests

ProcessWire feature requests.
40 stars 0 forks source link

Don't resize image if file does not exist. #269

Open ghost opened 5 years ago

ghost commented 5 years ago

@CaelanStewart commented on Jan 17, 2019, 10:08 AM UTC:

Short description of the issue

I've just advised a colleague not to load the page before the images are present for reasons's below, so working from memory.

If an image file does not exist, but is in the database, and the image needs to be resized, ProcessWire tries to resize it any, creates an invalid image file, and then won't resize further.

In work we have three environments, development, staging and production. We don't keep files in the repo because they become very large and unmanageable. And as such, this adds another dimension of annoyance when deploying between environments. We've been meaning to build a simple tool for transferring the files automatically between environments (a bit like that plugin for WordPress that allows you to push/pull between environments and it'll handle database entries and files, but only for files).

And there have been a countless number of times where we've just need to make a quick fix or change to a site, and so work is assigned to a dev, they pull, but then we also have to log into Plesk, archive the files directory, let it download at a measly pace, and then start.

There have been times where I've forgotten not to load up the site in case it resizes any images, because then I have to dump the database, re-import and then load up the page with the files present.

It's just one of those annoying things that happens every now and then.

If it's just a quick CSS change, I don't even need the imagery to be present most of the time.

Expected behavior

Check if the file exists, and if it does, resize it, otherwise, return and try again on next page load.

Actual behavior

ProcessWire resizes a non-existent image, produces an invalid file, does not error out, and logs in the database the resized variation of the image that is actually invalid. On next load, it will not try to resize the image again.

Optional: Screenshots/Links that demonstrate the issue

Your screenshots/links go here.

Optional: Suggestion for a possible fix

<?php

// method
function resize() {
    if ( ! is_file($path)) {
        return;
    }
}

This is just one of those things that's a nice easy fix with a high reward for those with a similar workflow.

Steps to reproduce the issue

  1. Upload an image to a page
  2. Add code to resize the image
  3. Delete the image file
  4. Load the page from step 1
  5. Put the image back
  6. The image won't load because the variation is invalid

Setup/Environment

This issue was moved by netcarver from processwire/processwire-issues#790.

ghost commented 5 years ago

@ethanbeyer commented on Jan 17, 2019, 5:16 PM UTC:

I like the solution for your fix.

Additionally, I have been wanting to talk to anyone about how they handle their database and uploads from Dev -> Staging -> Production and back. I'd be interested in talking with you about the "simple tool" you had in mind.

ghost commented 5 years ago

@CaelanStewart commented on Jan 18, 2019, 9:25 AM UTC:

ethanbeyer, well, our idea was some script that can be run locally, that can push to staging, that will essentially contact the server, request a tree of the files folder, then locally find the differences and then upload the missing/changed files. And then the same could be done from staging to production.

But, if you wanted to go in as deep as exporting/importing pages/templates/fields etc, that's touching on quite a lot of PW's internals, and could grow into a complex beast. I guess you could abstract that out and simply synchronise the table entries.

But that is another dimension of complexity, so to avoid that, we're happy moving databases around, as that's pretty quick, it's just the zipping up and uploading of the entire files folder that's a bloody pain because of the size it can grow to on image heavy sites.

ghost commented 5 years ago

@horst-n commented on Jan 18, 2019, 9:34 AM UTC:

ryancramerdesign If it hasn't any downsides I'm currently not aware, I would second this request, not to save a variation (text) file with an error message but left it without a variation file. (?)

ghost commented 5 years ago

@CaelanStewart commented on Jan 18, 2019, 9:42 AM UTC:

horst-n, well, if the image is not there, then it means the developer has interfered in the natural workings of ProcessWire, and as such, would never happen under normal circumstances. So it should have zero side effects.

And anyway, it doesn't make sense to just try and resize and image without checking if the file exists or not.

And I'm surprised no error is thrown by whatever code is resizing the image, so perhaps error surpression with @ or with a try with an empty catch was used.

In that case, I question the logic behind that decision.

ghost commented 5 years ago

@horst-n commented on Jan 18, 2019, 10:18 PM UTC:

Hi CaelanStewart ,

just think, a single online page rendering 20 paragraphs of text, maybe other stuf like forms and so on and 20 images. If only one image is missing, an error would break the whole page rendering! (Online!)

That's why a small textfile is created, containing the reason of the error. And a missing parent image is not the only reason for unrenderable variation files. Not enough memory is one too, or corrupted images, or missing file access, and so on. This all is covered by design, by writing a small textfile, - without suppressing errors the dirty way. :)

If you now may think not throwing an error prevent a dev to get informed about it: When debugging is enabled in site/config.php, everything gets logged into imagesizer and error or warning log(s). And when using TracyDebugger, you see it immedeatly, if I remember right.

So, for me it is no question behind that logic.

ghost commented 5 years ago

@CaelanStewart commented on Jan 18, 2019, 11:26 PM UTC:

It doesn't make sense to leave a text file if the file is not there and then fail silently (unless you check logs).

The image would be there had the developer not moved it. As such, there has been outside interference. If debug mode is enabled, throwing an exception seems the logical thing, and if debug mode is disabled, just don't create the file or save the variation and then a 404 error will appear, which is far less obscure in my opinion.

404, file not found, oh, maybe the image is missing? It makes far more sense, and is consistent with how the rest of the internet works.

When this happened before (I had forgotten about the text file, and not gotten your mention of it in your previous comment), I opened the image in a new tab, and I just saw "invalid image data" or something like that, I was confused by that since it implies the image is there but corrupted or something. I didn't realize immediately it was even ProcessWire doing that.

What I'm saying is it's an obscure, non-intuitive way of dealing with it.

It doesn't even tell you if you edit the page that the image is missing if I remember correctly, it just shows the default placeholder for a broken image in your browser of choice.

And what has happened here is actually error suppression in an abstract sense, you've just made it non-obvious that an error has occurred.

Let's say the image is hidden behind some modal popup, or is at the end of some carousel, and you forget to upload files to production and only one image in some random place and nobody notices?

Failing silently is a recipe for disaster in my opinion.

https://softwareengineering.stackexchange.com/questions/190528/error-handling-should-a-program-fail-on-errors-or-silently-ignore-them

This is a minor case, of course, but it's important nonetheless. Especially since this error would not occur had there not been outside interference.

And it is logical to assume the developer will put back any missing images and want them to resize automatically.

If something's broken, and you're forced to fix it, you will fix it.

ghost commented 5 years ago

@teppokoivula commented on Jan 19, 2019, 8:42 AM UTC:

... and I just saw "invalid image data" or something like that, I was confused by that since it implies the image is there but corrupted or something. I didn't realize immediately it was even ProcessWire doing that.

That's what I did as well. In fact until I read this issue I had no idea what exactly was happening there, and I must agree that it's a bit odd, to say the least. Now that I'm looking at the code, it's there alright:

// write an invalid image so it's clear something failed

On the hindsight this also explains why on one site I had to jump through a lot of hoops to figure out if and when a variation images was actually a proper image, and when it would be something broken. In this case the problem escalated because those variations were automatically passed to a separate module, and when it got a broken image it brought the entire page (front page, no less) down.

Let's just say that I've spent a lot of time finding and removing broken variation files manually 😃

Note that this wasn't occurring because a developer has removed the files, but rather because resize was failing for other reasons. So no, this kind of failure isn't always a result of a strange action from the developer.

...

Horst absolutely is right that we don't want to break rendering if an image is missing (or broken), but I don't agree with the current solution either. It's really obscure, and only makes debugging real issues so much more difficult. Thus I agree that the best thing in this case would indeed be to not create a variation file, though admittedly I'm not entirely sure what else that would/could change/break.


A bit OT, but in the past I've handled the dev - production setup with a shared database and syncing assets between environments using rsync. Manually downloading files from Plesk sounds like a lot of unnecessary effort, so I'd definitely look into rsync or other similar tools first. That being said, storing assets in buckets (AWS, Google) would be a better idea: not only would it eliminate the need to sync assets altogether, but it would also provide you with other benefits – such as not burdening your web server with static asset requests.

ghost commented 5 years ago

@CaelanStewart commented on Jan 19, 2019, 10:12 AM UTC:

teppokoivula

I disagree with a couple of paragraphs:

Note that this wasn't occurring because a developer has removed the files, but rather because resize was failing for other reasons. So no, this kind of failure isn't always a result of a strange action from the developer.

Well, if you even got to the stage of resizing, the original image must have been there in the first place, unless you do some sort of resize on upload and dump the original. But then that's not default behaviour. Or unless you mean the upload was failing? In that case the file upload module should be doing a final sanity check to see if the file is there, and not leave it to the resizer to indirectly verify.

But still, there should be a clear separation of concerns, the image resizer should be verifying if the image resize was successful by checking if the file exists after the resize is complete. It should not be the responsibility of code that runs on a subsequent request to end up indirectly verifying the resize.

That's a mixing of concerns between conceptually different aspects of the system.

Horst absolutely is right that we don't want to break rendering if an image is missing (or broken), but I don't agree with the current solution either.

This obsession with backwards compatibility is why WordPress's code is such a shit show. It really is quite abhorrent. So fragmented and inconsistent that it deserves it's own "fractal of bad design" article like PHP has been readily given.

I'm sure there are methods that can be engineered to minimise impact. Most ProcessWire installations wouldn't even be affected. Since in a stock installation, it would be the fault of the developer that an image is missing, or some exceptional circumstance, or some fault of code in an earlier process that did not verify it's own actions.

ProcessWire's image handling system is pretty damn good, so it's only going to be exceptional circumstances where this could break functionality. But still, fixing whatever is broken would be very simple. It's not like a fundamental change in API structure has happened, just that at one particular point, there's another possibility to consider.

Also, in regards to the AWS/shared DB suggestions, that sounds interesting, but I don't want there to be real time sync between dev/staging/production, as that would just result in the client ringing up and asking why there's lorem ipsum and broken code (due to schema changes but code hasn't been pushed).

It is my plan to see if there is a library that can handle diffing file trees and syncing a given directory between different environments. I don't want to reinvent the wheel here.

Thank for the suggestions.

ghost commented 5 years ago

@teppokoivula commented on Jan 20, 2019, 7:46 AM UTC:

CaelanStewart, I'm not entirely sure what you're disagreeing with on the first paragraph. Yes, the image was there – if it wasn't, there wouldn't be a resize action in the first place. Like I said, the resize was failing, just not due to the image being removed from the disk.

It's been a while and sadly I can't recall the specifics, but I believe there were various reasons for this, as well: running out of memory, something strange in the image data, and probably others as well. The point is that developer removing the file manually is not the only case where one can run into resize issues. I wasn't disagreeing with what you said, but rather pointing out that it might be even more common issue than you realise :)

This obsession with backwards compatibility is why WordPress's code is such a shit show. It really is quite abhorrent. So fragmented and inconsistent that it deserves it's own "fractal of bad design" article like PHP has been readily given.

I'm afraid you might've misunderstood what I was saying there. I was not talking about backwards compatibility, even though that is indeed important as well. I literally said that we shouldn't break rendering. The point, as Horst mentioned earlier, is that even if resizing a single image fails, we don't want the whole page rendering process to fail fatally.

This is about fault tolerance, not backwards compatibility. It is a common use case for ProcessWire site to, say, render a list of items – and in many, probably most, cases a minor issue with one item isn't the end of the world: if one thumbnail is left out, it's usually better to keep things running and notify the developer rather than break the entire thing.

It is my plan to see if there is a library that can handle diffing file trees and syncing a given directory between different environments. I don't want to reinvent the wheel here.

Clearly not the right place to discuss this, but just for the record: rsync is exactly that. It's not a super sophisticated diff tool, but rather compares files and update times. Typically this is what one would want for static assets. The shared database approach was an easy way out; I'd rather recommend setting up a simple (and preferably one-way) sync for that, so that one can easily sync production database into development, and not the other way around.

Database-altering local changes would be easier to handle programmatically using a tool such as Migrations. Migration files have the major benefit of being easy to track and share between environments using a version control system.

ghost commented 5 years ago

@CaelanStewart commented on Jan 20, 2019, 10:10 AM UTC:

teppokoivula,

Apologies, sometimes my thoughts are fragmented and I lose coherence a little, let me try to be a little clearer.

there were various reasons for this, as well: running out of memory, something strange in the image data, and probably others as well.

Running out of memory is a pain (thank you ImageMagick for basically eradicating it!), though that will kill the page for sure.

My specific view is: If debug mode is enabled, throwing an exception seems the logical thing, and if debug mode is disabled, and the image is missing, just don't create the file and don't save the variation and then a 404 error will appear for the image URL when the page is viewed.

If we're at the stage of resizing and the image is there - but it's invalid or corrupt, then yes I agree, outputting an invalid image does seem more fitting. But I still think that an exception should be thrown if debug mode is enabled, just like other errors are shown on debug mode.

It's just that if the image is not there, i.e, the file is missing, then I feel an text file is not the way to go.

And yes, I guess upload can fail, but the upload process should do a final sanity check to see if the upload was successful, and maybe it does already - then that's good. If the upload stage somehow runs out of memory? Well, that's unlikely, and that's just one of them things you can't do much about.

I know there are other extenuating circumstances that may cause the file to disappear, but those are very rare, and out of the 15 PW sites I can remember off hand building, the only cause of this error has been moving between environments.

I'm afraid you might've misunderstood what I was saying there. I was not talking about backwards compatibility, even though that is indeed important as well. I literally said that we shouldn't break rendering. The point, as Horst mentioned earlier, is that even if resizing a single image fails, we don't want the whole page rendering process to fail fatally.

This is about fault tolerance, not backwards compatibility. It is a common use case for ProcessWire site to, say, render a list of items – and in many, probably most, cases a minor issue with one item isn't the end of the world: if one thumbnail is left out, it's usually better to keep things running and notify the developer rather than break the entire thing.

I see, I did indeed misunderstand that. In that case, I agree.

Which is exactly why I said: "If debug mode is enabled, throwing an exception seems the logical thing" because I understand the logic behind that argument.

And if the change was made to not create the variation if the file does not exist, then a 404 error would show in the console on the page, which makes the error a little more audible.

Clearly not the right place to discuss this

Agreed. Though I must thank you for detailing a little more about that. I'll look into that!

Sorry if I get a bit much.

ghost commented 5 years ago

@teppokoivula commented on Jan 20, 2019, 2:46 PM UTC:

My specific view is: If debug mode is enabled, throwing an exception seems the logical thing, and if debug mode is disabled, and the image is missing, just don't create the file and don't save the variation and then a 404 error will appear for the image URL when the page is viewed.

If we're at the stage of resizing and the image is there - but it's invalid or corrupt, then yes I agree, outputting an invalid image does seem more fitting. But I still think that an exception should be thrown if debug mode is enabled, just like other errors are shown on debug mode.

Giving this a bit more thought, I'd suggest the following:

I'm not entirely sure if we're on the same page regarding the second one, but my point stands: I get that current behaviour was put in place so that the developer would notice the broken file and check the text in it, but I still think that this is really obscure behaviour and most devs probably won't guess to check if an image might contain text.

Long story short, in my opinion it would be better to not create a variation file at all in this situation. I don't really see any value in it, except perhaps that ProcessWire won't try to create the variation on subsequent renders – and personally I see that more as an issue than a good thing.

horst-n, what do you think, are there any gotchas I'm missing here?

Testing this locally, simply uncommenting those two lines in the code that create the broken variation file did the trick. So far I'm not seeing any side effects either, in part because a blank(ish) Pageimage object is returned either way.

ghost commented 5 years ago

@CaelanStewart commented on Jan 20, 2019, 3:45 PM UTC:

teppokoivula, from that, we are completely on the same page. In the second point, relaxed my opinion on the creation of an invalid file on a corrupt image because I wrongly got the impression that the consensus was to allow it in that circumstance - and because it's rarely an issue and I'm not too worried either way.

Glad to hear you have found no downsides so far.

ghost commented 5 years ago

@horst-n commented on Jan 21, 2019, 10:45 AM UTC:

teppokoivula I'm with what I said above in my first post: if possible without sideeffects, I don't want that an error-image is created anymore. Let it be missing and resulting in a 404. And if debug is on, write a log entry and (what you and Caelan suggested), throw an error. (But ATM I don't have time for thouroughful testing, that's why I pinged Ryan.)

ghost commented 5 years ago

@ryancramerdesign commented on Feb 4, 2019, 7:55 PM UTC:

Thanks, I have pushed a fix for this issue, though there's a lot here and not positive I understood it all so please let me know if the fix is missing anything.

ghost commented 5 years ago

@teppokoivula commented on Feb 5, 2019, 3:49 PM UTC:

Looks good to me, thanks ryancramerdesign!

ghost commented 5 years ago

@netcarver commented on Feb 6, 2019, 8:58 PM UTC:

CaelanStewart Hi Caelan. Could you see if Ryan's commit fixes this for you. If so, please go ahead and close the issue. Many thanks.

ghost commented 5 years ago

@CaelanStewart commented on Feb 7, 2019, 2:47 PM UTC:

netcarver, Hi.

Given it a quick test, it doesn't produce an invalid image file anymore, and as such a 404 error shows in the browser console. So that aspect seems to be all good.

However, another suggestion that was buried in the above responses is to throw an exception when debug mode is enabled.

It doesn't seem fitting to fail silently (entry in log file without prominent error is pretty silent) if debug mode is enabled.

Of course, an indirect exception of sorts is the 404 in the browser console, which most devs would have a hard time missing as usually devtools are up permanently in any context where debug mode would be on. So it's not a massive deal, but it does feel like an exception should be thrown by ProcessWire if debug mode is enabled and the image file doesn't exist.

An invalid image file it feels should be handled on upload, and an error should be shown to the user when the image is uploaded. That may already be so, but I'm not sure from memory.

ghost commented 5 years ago

@netcarver commented on Feb 7, 2019, 5:55 PM UTC:

CaelanStewart Hello Caelan, and thank you for taking another look at this.

I think I'll leave this issue open for a week so you can post any updates, or close it if you wish. If it's still open in a weeks time with no further feedback or argument as to why this is a bug, I intend to close it and invite you to add a new issue on processwire-requests for any improved functionality regarding handling of invalid image uploads.

Let me know if that's a sensible way forward.

ghost commented 5 years ago

@netcarver commented on Feb 8, 2019, 9:20 PM UTC:

/remind me to transfer to requests if needed in 1 week

ghost commented 5 years ago

reminders[bot] commented on Feb 8, 2019, 9:21 PM UTC:

netcarver set a reminder for Feb 15th 2019

ghost commented 5 years ago

reminders[bot] commented on Feb 15, 2019, 9:18 AM UTC:

👋 [netcarver](https://github.com/netcarver), transfer to requests if needed
reminders[bot] commented 5 years ago

@move[bot] set a reminder for Feb 22nd 2019

teppokoivula commented 5 years ago

This thread is already unbearably long, so to summarise:

The issue mentioned in the topic has been resolved, but what remains is the question whether ProcessWire should throw an exception if resizing an image fails when debug mode is enabled.