thephpleague / glide

Wonderfully easy on-demand image manipulation library with an HTTP based API.
http://glide.thephpleague.com
MIT License
2.55k stars 199 forks source link

Brainstorming on Glide 2.0 #170

Open reinink opened 7 years ago

reinink commented 7 years ago

I've started thinking about Glide 2.0. The purpose of this issue is to share those ideas and get some discussion going about it.

Serve cached images via web server

My number one goal for Glide 2.0 is allowing cached images to be served by the web server, instead of Glide. Meaning the first request to a manipulated image would hit PHP (and Glide), but all subsequent requests will be handled by the web server automatically. Further, beyond the standard front-controller rewrite rule (routing all requests through index.php), I don't want there to be any extra web server configuration required.

The simplest way to accomplish this is to save the cached images in the same place, and with the exact same URL as the requested images. This way, when the first request comes in, the cached image won't exist, so the request will be routed to PHP and handled by Glide. However, all subsequent requests will hit the cached image directly, without hitting PHP. This should greatly improve the performance of loading cached images.

New URL format

This will not be possible with GET parameters since they are not treated as part of the filename. Further, the proper file format extension (jpg, gif, png) is also required so that the web server will output it properly. Finally, this will also require saving images in a public folder. Here is a proposed format that I've been testing with that works well:

http://example.com/img/kayaks.jpg/w-800-h-500-fit-crop-s-5ac7f6163057bf1931fd5b513f797644939ece7724ee3846a27e9915f3ea5eda.jpg

It's obviously not as nice as the current GET based URLs, but maybe that's okay. By grouping the cached images into source image folders it's easy to delete all the cached images for a specific image.

URL generation

I really feel like URL generation using a helper method is going to solve a lot of issues. Right now Glide ships with a URL generator, but it's sort of buried in the security documentation. I think this has been a bit of an issue with Glide 1.0. It makes it look like you can easily just edit URLs, but in reality if you sign your images (and you absolutely should be) you have to use some type of URL generator.

By making a URL generation helper more of a first-class feature, it will help minimize the pain caused by the less desirable URL format. Plus it will make HTTP signatures easier to work with. It will look something like this:

<img src="<?=glide('landscapes/lake.jpg', ['w' => 800])?>">
<img src="{{ glide('landscapes/lake.jpg', ['w' => 800]) }}">

HTTP signature simplification

HTTP signatures would work in the same way as they currently do, except will also be part of the filename. I want to make HTTP signatures easier to work with in 2.0, since they are so important. Part of me is even tempted to require them. I want to make it possible to set the sign key right on the Glide object, and then let Glide check this value automatically when generating images, as opposed to right now where you must manually check the signature. This isn't a big step, but I think it's a hurdle for people new to Glide, and it's easy to skip setting them up.

// Setting the key:
$glide = ServerFactory::create([
    'sign_key' => 'your-signing-key',
    'source' => __DIR__.'/images',
    'cache' => __DIR__.'/public',
    'base_url' => 'public',
]);

// Verifying the key (happens automatically):
try {
    $glide->outputImage($path);
} catch (HttpSignatureException $e) {
    // handle failed signatures
}

Securing image URLs

As already noted, to have the cached images loaded by your web server you must place the cached images in a public folder. However, we could still keep the current functionality, where the cached images are loaded through PHP. This can be helpful in situations where you want to do authorization checks on the cached images before displaying them. To make this work you would simply make your cache folder non-public.

From my experience, this isn't a super likely requirement. Typically HTTP signatures are a sufficient enough security since these URLs are next to impossible to guess. Just something to consider.

Supporting the old functionality

Finally, I hope to still support the old functionality. This may look like this, although this is still open for discussion:

// Will assume the new path format
$glide->outputImage($path);

// Old functionality, where you give it the source image path
// and an array of manipulation params
$glide->outputImage($path, $_GET);
$glide->outputImage($path, ['w' => 100]);

Many people use Glide without using GET params, so being able to manually pass in an array of manipulation params is also important.

reinink commented 7 years ago

@jasonvarga Hey Jay, I'd love to get your input on these changes. I know you've run into some of these exact issues with Statamic.

reinink commented 7 years ago

@freekmurze I'd also love to get your eyes on this, since it will probably impact your Laravel Glide library as well.

jasonvarga commented 7 years ago

Loving the sound of this. 👍

Preventing bootstrapping of your entire app is a great idea and something we've struggled with. #152 was created to battle that.

My only (super small) issue with the new filename structure is that some of our users love to be able to have some control over the actual filename. So if you try to save the image, you get the filename in it. In Glide v1 we'd allow that by having an additional URL segment that is simply ignored by Glide.

Maybe you could include it similar to this.

http://example.com/img/kayaks.jpg/kayaks-w-800-h-500-fit-crop-s-5ac7f6163057bf1931fd5b513f797644939ece7724ee3846a27e9915f3ea5eda.jpg

That way, you at least see kayaks in there if you save it.

The URL generator and signature stuff is great. I don't think people would (or should) ever really just be editing URL params anyway.

Finally, being able to manually pass params into Glide is something we use heavily. ($server->makeImage($path, ['w' => 100]); etc) We're also using Glide to pre-generate images, without URLs at all.

cdowdy commented 7 years ago

Like this! Being able to serve a cached image from the server/disk is something I really look forward too!

To piggyback on @jasonvarga filename suggestion for downloading a URL structure like:

http://example.com/img/kayaks/{{ modifications }}/kayaks.jpg

could "solve" that. You still group the images in source folder - the extension-less kayaks part, then the modifications and the filename.

Another aspect of this is we could also use the modification part of the URL as the signature. Sign that. ie:

http://example.com/img/kayaks/12nx0cgb8u08954/kayaks.jpg

ideally a short hash could be used there.. something like http://hashids.org/php/ but I may be putting the cart before the horse here haha. That really may not be feasible nor secure ?

reinink commented 7 years ago

Awesome, thanks for the feedback @jasonvarga. A couple more questions, if I may.

We're also using Glide to pre-generate images, without URLs at all.

Honest question: what is the benefit for you to using Glide if you're not using the HTTP functionality? Do you simply prefer the API that it provides, over say using Intervention Image directly?

So if you try to save the image, you get the filename in it. In Glide v1 we'd allow that by having an additional URL segment that is simply ignored by Glide.

Interesting. I think we could probably do that. In fact, you could technically put whatever you wanted in the cached filenames. However, one other idea. What if we simply put set the source filename in the header, with the manipulation parameters, but without the signature?

header('Content-Disposition: inline; filename="kayaks-w-800-h-500-fit-crop.jpg"');

The URL would still look like what I proposed in the browser, but if you saved it you'd get a nicer name. What do you think?

jasonvarga commented 7 years ago

What is the benefit for you to using Glide if you're not using the HTTP functionality?

In this case, the API is easier to use than Intervention. The pre-generated stuff is just an option if the user needs the performance boost.

We do use the HTTP method, and we built in a URL generator into our template language essentially the same as the Blade example you mentioned.

What if we simply put set the source filename in the header

Maybe I'm misunderstanding, but if the web server is going to be serving the images, doesn't that mean you won't have control over the headers anymore?

reinink commented 7 years ago

@jasonvarga I obviously need more coffee. That is correct. ☕️

reinink commented 7 years ago

@cdowdy Thanks for the input! I wonder if we can combine some ideas here:

http://example.com/img/kayaks.jpg/5ac7f6163057bf1931fd5b513f797644939ece7724ee3846a27e9915f3ea5eda/kayaks-w-800-h-500-fit-crop.jpg

With this URL you get the signature, but it won't be part of the filename. You get the original source filename in there, plus the image manipulations.

I don't think hashing the image manipulations work work as a replace for signatures, since the HTTP signatures really need to be a one-way encryption, where a hash would have to be decrypted.

jasonvarga commented 7 years ago

The other-other option is to put the manipulations in a folder and leave the filename as-is.

http://example.com/img/kayaks.jpg/5ac7f6/800-h-500-fit-crop/kayaks.jpg

Although, maybe it's nice to have the manipulations in the filename.

reinink commented 7 years ago

Although, maybe its nice to have the manipulations in the filename.

That's kind of what I was thinking.

reinink commented 7 years ago

So, here is one more curve ball to all this. And this is really where the can of worms opens up. This whole system that we're discussing will work locally because of how web servers are able to fallback to PHP. However, this will not work for any other storage systems, like Amazon S3. If Glide 2.0 could come up with a way to solve those as well...then we'd really be winning.

I've been thinking more about the URL generator, and how that could help. Here is an idea:

  1. You request an image in your application using glide($path, $params)
  2. This function checks if the cached image exists.
  3. If yes, it displays the URL to the cached image (could be local, could be S3, whatever)
  4. If no, it displays the Glide URL and the cached image will be generated

There are two problems here:

  1. Checking to see if the cached image exists will be slow (ie. requires a request to Amazon S3)
  2. The URLs will change after the first request, which means another image download later on if you happen to be the first user to load the image

The second problem here isn't that big of a deal. If you're worried about the first user you can always warm up your cache by eager generating your images.

The first problem is much more challenging. I'm not really sure how to overcome this one. The only idea I can think of is a local cache file that keeps track of all the images that have been cached. This could be a PHP file, or even saved in a database. It feels messy though.

I'd like to keep thinking on this one.

cdowdy commented 7 years ago

Checking to see if the cached image exists will be slow (ie. requires a request to Amazon S3)

what I'm suggesting here would add complexity that may not be needed :) haha

What if you were to save any image using the S3 adapter ( or any other non local one) to memory then on return visits provide the S3 url - either though x-accel/x-sendfile? The problem with the x-sendfile variants is you still need to make that call to the externally hosted images and I'm not so sure how that degrades in instances where the image still isn't uploaded to S3. Or a cookie/session/token needs to be set which .. yeah not always ideal

Aside from your latest can-o-worms haha.. after thinking on it, as of right now I prefer your's and Jason's suggestion for the filename /modifications and how the url is structured.

reinink commented 7 years ago

What if you were to save any image using the S3 adapter ( or any other non local one) to memory then on return visits provide the S3 url

When you say to memory, what do you mean? Local cache file or database?

While x-accel/x-sendfile are really cool, I think using these features is just too much for the Glide project. I'd like Glide to work without them, but still allow "power users" to take advantage of these kind of web server features if they want to. I just don't want them to be requirements.

reinink commented 7 years ago

I'm wondering if Flysystem's caching layer can help here. In particular the "Persistent Caching". I think this does exactly what I was thinking. It looks like it supports multiple caching methods as well, including the local disk, Redis, Memcached and more.

@frankdejonge, are you able to jump in here for a sec and let me know if this would work? Basically we want to be able to check if a cache file exists on Amazon S3 (or elsewhere) over and over. I'm worried about the performance of this, and was hoping caching could help.

reinink commented 7 years ago

I think I know how I'm going to do this! 💡💡💡

First off, if you use local storage for caching, then the above technique will work out of the box. The web server will try find the image, and if it cannot it will fall back to PHP. That works perfect.

Next, if you use a non-local storage for caching, that is fine, but by default your images are still served using Glide via PHP. This is exactly how Glide currently works. This is especially helpful in situations where you have multiple servers, since you'll want to host your Glide cache on something like Amazon S3, otherwise each server will need to process the image.

However, if you don't want your images served via PHP you simply need to use a CDN or storage that supports fallbacks. In this setup you always request your images from the CDN, and in the event that the cached versions don't exist, they fall back to PHP to be generated. When they are generated they are saved to the CDN, so this will only happen once.

I've actually got this working with Amazon S3, and it's pretty straightforward:

  1. Set the Glide URL generator to use the Amazon S3 URL instead of your PHP URL. For example, http://glide-testing.s3-website-us-east-1.amazonaws.com/. This is literally the only change you need to make to your PHP configuration!
  2. Setup Amazon S3 to fallback to PHP if the image is not found. I learned about this here. Basically you just enable web hosting on your bucket, and then use these routing rules:
    <RoutingRules>
        <RoutingRule>
            <Condition>
                <HttpErrorCodeReturnedEquals>404</HttpErrorCodeReturnedEquals>
            </Condition>
            <Redirect>
                <!-- Your domain name: -->
                <Protocol>https</Protocol>
                <HostName>yourdomain.com</HostName>
                <!-- Your local glide endpoint: -->
                <ReplaceKeyPrefixWith>img/</ReplaceKeyPrefixWith>
            </Redirect>
        </RoutingRule>
    </RoutingRules>

I honestly feel like this gives the best of all worlds. For those caching locally, it just works. For those looking to host the cache on a CDN or remote storage, there is a little more setup, but it's still quite doable.

More information on Amazon S3 bucket routing can be found here.

frankdejonge commented 7 years ago

@reinink I read the email before and what you came up with was exactly what I wanted to suggest. Another alternative is to put varnish in front of glide, which is pretty much sort of a "role your own" solution for these kinds of situations. I actually use that on some projects and it works really well. Alternatively you could even use nginx's cache functionality for this, which is super fast and super easy to setup. Chances are you've already got nginx in your stack, in which case it's a quick win without the need of a big investment in time, infrastructure, and maintenance.

frankdejonge commented 7 years ago

TL;DR: since glide's use case is mostly on http, why not use http tooling?

reinink commented 7 years ago

@frankdejonge Yes, totally. The thing is people use Glide because it has a really nice API, not necessarily because it's HTTP based. While that's certainly part of it, I'm trying to find a nice balance of sensible defaults, while still allowing for CDNs and other caching mechanisms like Nginx, Varnish, etc.

reinink commented 7 years ago

@frankdejonge Or put differently, Glide is still a PHP tool. I want it to work in standard PHP configurations without needing any custom web server configuration.

frankdejonge commented 7 years ago

@reinink I get that. Then again, tackling scaling issues often requires a developer to not just use one tool.

reinink commented 7 years ago

@frankdejonge Again, totally agree. However, I suspect most people using Glide aren't dealing with scaling issues. I want the library to work really well out-of-the-box for "typical" PHP projects while still being compatible with caching/CDNs layers when you need it.

freekmurze commented 7 years ago

@reinink Loving the sound of all this. The S3 fallback url is very cool (didn't know that existed either).

The latest version of laravel-glide doesn't support the entire feature set of Glide. v2 only offers the manipulation of an image using an array with parameters. It's mainly being used in our laravel-medialibrary package.

When Glide 2.0 is released I'll be sure to create a new major version of laravel-glide that 'll support all features that Glide 2.0 will offer.

Have fun creating Glide 2.0, I'll keep an eye on this thread.

emilv commented 7 years ago

We use the Content-Disposition header to give our images a nicer filename for download. Example:

https://smoothcomp.com/event/126/image/orlando-international-pro-jiu-jitsu-championship-gi-open-to-all-nationalities-2017-cover-3c07aa2d7fd8.jpeg?w=845&h=315&fit=crop&s=1269e7d2934246bd7eeea1d29768c25b

We add this header:

Content-Disposition: "inline; filename=orlando-international-pro-jiu-jitsu-championship-gi-open-to-all-nationalities-2017-cover_845x315.jpg"

We get all the caching benefits (we don't use the Glide cache at all, only the Nginx cache), but still have a nice filename if someone saves the image.

We do not want image manipulations in the pathname. If, however, one chooses to use the Glide cache it's great to have all variations somewhere under the filename for easy cache invalidation. But isn't that orthogonal to passing of manipulation parameters?

reinink commented 7 years ago

@emilv Yeah, with Glide 2.0 you'll totally still be able to run with this setup. The proposed changes above are simply for those who don't get this sophisticated with their setups. I'm curious, would you be willing to share how you did your nginx caching?

emilv commented 7 years ago

We use fastcgi_cache:

fastcgi_cache_path /var/lib/nginx/cache levels=1:1 keys_zone=fastcgicache:2m inactive=60d;
fastcgi_cache_lock on;
fastcgi_cache_key $request_method.$scheme://$host:$server_port$request_uri;
fastcgi_cache fastcgicache;
fastcgi_cache_use_stale error updating;

Anything with Expires headers gets cached. (Careful not to include Set-Cookie headers on these responses or else they won't get cached; we use fastcgi_ignore_headers and fastcgi_hide_headers to remove any cookie headers on these responses)

reinink commented 7 years ago

The other thing I want to review for Glide 2.0 is how the responses work. I have a couple issues with them:

Issue 1: They are complicated

I like that Glide supports HttpFoundation, PSR-7 and CakePHP, but I find the configuration quite complicated. I worry that many users of Glide end up using the outputImage method instead of using proper response objects. This has led to issues of images not displaying properly. You have no idea how often people accidentally output whitespace or comments after outputting an image and things break.

Using proper response objects that can be handled by a frameworks standard flow is much better and I'd like Glide 2.0 to make this easier to setup.

Issue 2: Too many packages

I've tired to solve how difficult responses are to setup by creating support for common vendors. However, this has come with it's own pain. We now have six response factory packages that need to be maintained. We currently have these packages:

Each of these packages only include one file: a simple response factory. Slim and Zend both use PSR-7. Symfony and Laravel both use HttpFoundation. Cake has it's own request/response objects (although I believe they are moving toward PSR-7).

Literally the only reason I have them split into separate repos is for testing. In order to test the Cake response factory I need to install the entire Cake framework. Same for Symfony, Slim, and the rest. Putting these all as dev dependencies of Glide just isn't practical. However, managing six packages isn't practical for me long term either.

Goals:

Moving forward, I'd like to:

  1. Simplify the configuration of responses
  2. Remove these extra libraries and move everything into the main Glide library

I've got two ideas on how to do this:

Solution 1: HttpFoundation & PSR-7

The first option is to remove all the vendor specific implementations (and packages) and only offer the following two response factories right in the main package:

If one of those two doesn't work, you easily to add a custom response factory by implementing the ResponseFactoryInterface.

To test this we would include http-foundation as a dev dependencies as well as one PSR-7 implementation. I'm thinking zendframework/zend-diactoros would work fine here.

Solution 2: HttpFoundation & PSR-7 Bridge

The second option is a bit more intense, but also simpler. We go back to an HttpFoundation only setup. This would require no response configuration, since Glide would always return an HttpFoundation response.

However, understanding that PSR-7 is used by a number of popular frameworks, I still want to support this somehow. I think the the PSR-7 bridge is the perfect solution here. The only question would be whether we build in support for this bridge directly, or just explain to users how they can implement this themselves.

twistor commented 7 years ago

Wanted to chime in here regarding serving generated image files from external caches. I've done a fair bit of work with this and the Drupal Flysystem module.

The currently proposed solution for S3 with the fallback is pretty nifty, I didn't know that existed. The only problem I see is that it's S3 specific. I have no idea if other systems support that.

There are 2 generic ways to handle this that I'm familiar with:

  1. When outputting the image URL, check if the file exists remotely (using a Flysystem cache). If it does not exist, generate the image, save it to remote, and output the URL. This works ok, and is simple enough, but it can be very slow on a page with lots of images.
  2. When outputting the image URL, check if the file exists remotely (using a Flysystem cache). If it does not exist, generate a temporary image and serve that. Then copy the image to the remote server later on in some sort of background task, or in the same process after the response is served.

Option 1 is simple enough, but has scalability issues that we've run into with the Drupal.

Options 2 is what we're converting to, but probably requires too much framework integration to do generically. You need some sort of background task, and a lock to avoid copying the file multiple times.

Also, what about putting the signature in a GET param? It doesn't need to be in the filesystem, and would only be verified when PHP is doing generation.

reinink commented 7 years ago

@twistor Hey! Thanks for participating in this discussion. Some thoughts/responses:

The currently proposed solution for S3 with the fallback is pretty nifty, I didn't know that existed. The only problem I see is that it's S3 specific. I have no idea if other systems support that.

Agreed, this is a very specific S3 feature. How many other storage providers support this type of functionality, I simply don't know. However, the concept is a pretty common one with CDNs. You reference an image on your CDN and if it does not exist (a cache miss) it will fallback to the main server (Glide) to get the image.

When outputting the image URL, check if the file exists remotely (using a Flysystem cache).

Both of your solutions require this check. This was actually my original idea (see above), but I don't think it will work very well. On a page with a lot of images this could take up a ton of time, since each "exists check" requires an HTTP request. At that point it really begs the question if it's even worth showing the remote URL or just hitting PHP directly. Do you slow down the rendering of the HTML page just to get a speed boost on the loading of your images? That just seems wrong.

I too thought Flysystem caching could work here, but I think this is going to cause all sorts of other issues. It's going to require extra configuration to use Glide. There could be issues with the cache getting too large on sites with thousands of images. How frequently does this cache get updated? How does the cache get updated when a new image is generated? And who knows how many other tricky things we could run into here.

As far as I can tell right now, the way to get the best possible performance is to:

  1. Output the cached image URL without having to do any HTTP requests to figure out if the cached image already exists.
  2. Use fallbacks if the cache storage supports it. We already know this is possible with:
    • Local storage, using web server rewrite rules
    • Amazon S3, using the technique shown above
    • Any type of standard CDN setup

That said, there is absolutely no reason why you couldn't do a cached image "exists check" on your own project and output a different URL based on that check. I just don't think it's a good default for Glide.

Also, what about putting the signature in a GET param? It doesn't need to be in the filesystem, and would only be verified when PHP is doing generation.

That's an interesting question. I guess because I've been trying to avoid query strings in this release since I've heard that they can cause browser level caching issues (see this and this). Since we don't know if the cached file exists at the point of URL generation (as discussed above), we'd have to always output the signature, and so we'd always have a query string. But maybe that's okay? It would nicely simplify the URLs.

Do you (or anyone else) know if the query string caching issues are real?

twistor commented 7 years ago

Both of your solutions require this check. This was actually my original idea (see above), but I don't think it will work very well. On a page with a lot of images this could take up a ton of time, since each "exists check" requires an HTTP request.

Flysystem's cache will handle that avoiding extra network calls. In Drupal's case it's not as important, since there's block level caching as well, so most HTML doesn't get rebuilt every request.

There could be issues with the cache getting too large on sites with thousands of images.

Yep, we wrote a separate cache adapter to support this. Flysystem's default of storing the whole cache in one object does create problems with large filesystems.

How frequently does this cache get updated? How does the cache get updated when a new image is generated? The cache is is updated transparently on reads/writes/deletes. As long as the application is the only thing updating the images, this shouldn't be a problem.

I think you're right though, more sophisticated setups would require application/framework level support. So, it makes sense to make the defaults simple to use.

I guess because I've been trying to avoid query strings in this release since I've heard that they can cause browser level caching issues. Since we don't know if the cached file exists at the point of URL generation (as discussed above), we'd have to always output the signature, and so we'd always have a query string. But maybe that's okay? It would nicely simplify the URLs.

I know that Drupal uses an itok query parameter to avoid arbitrary image generation. I've never heard/seen issues with browser caching with this. It's handy for caching busting though, since when you change the parameters of the generated image, the token changes as well. I suppose that's not a problem if you include the transformations in the file path.

reinink commented 7 years ago

@twistor Thanks again Chris, it's been awesome all the feedback I've been getting. I tell ya, if setting up a cache like this with Flysystem was a simple thing to do I'd be all over this setup. Since it isn't I'll probably just add page to the docs which explains how to do this for those determined to do so.

And from my research on query strings affecting caching, this doesn't look like a major concern at all unless you explicitly tell your CDN to not cache resources with query strings.

tmirks commented 7 years ago

Looking forward to this!

I agree with @cdowdy's comments -- I don't like the double extension purely from an aesthetics perspective, if that matters at all. :) It may be necessary if you have multiple image formats though, and also might break some higher RFC standard.

I also like the idea of creating a hash from the parameters themselves, it would make for (more) elegant URLs.

After playing around with mcrypt a bit I'm not sure if there's a real advantage.

Original URL:

http://example.com/img/kayaks.jpg/w-800-h-500-fit-crop-s-5ac7f6163057bf1931fd5b513f797644939ece7724ee3846a27e9915f3ea5eda.jpg

Just the w, h, and fit parameters encrypted, base64/url encoded:

http://example.com/img/kayaks/PF7opHAQ0ice%2BFJU61r3plbzSP%2BWc4aqYMLshcjrxlw%3D/kayaks.jpg

The second URL is arguably uglier with the URL encoded values (it's decryptable tho).

reinink commented 7 years ago

@tmirks Hey thanks for sharing those thoughts!

The reason why you need the first extension is because you could in theory have both source images kayaks.jpg and kayaks.png. I agree that it would look better without it, but I don't think we have a choice really.

I'm actually going with @twistor's ideas to remove the signature and only have that as a query string. That leaves us with:

[domain]/[route]/[source-name].[source-extension]/[source-name]-[manipulation-parameters].[output-extension]?s=[signature]

Example:

http://example.com/img/kayaks.jpg/kayaks-w-800.jpg?s=71a618b8cbad3b0f9131a4574bbbe85ff7a0e16b36b5dde18ea59f3e249cc5b8

I actually think that's pretty nice!

cdowdy commented 7 years ago

I didn't take into consideration an image with the same name being two different types haha.

With that said what you've come up with @reinink works pretty good. Having the signature as a query string makes more sense than having the ugly modifications hashed.

Also by coincidence it coincides with many responsive images tutorials markup pattern with a width or height added to the SRC candidate

Those typically being something like

<img sizes="100"   
        srcset="kayaks-w200.jpg 200w, kayaks-w600.jpg 600w, kayaks-w800.jpg 800w" 
        src="kayaks-w200.jpg"

So new comers to glide will feel a little more comfortable so that's a nice side effect.

Do you plan on moving the signature factory to use hash_hmac or continue using md5?

reinink commented 7 years ago

@cdowdy Awesome, glad you like the direction!

And yes, I'm moving away from MD5 and using hash_hmac() instead. This is what I'm planning on:

return hash_hmac('sha256', ltrim($this->path, '/').'?'.http_build_query($attributes), $this->server->getSignKey());

This is per @paragonie-scott's recommendations here: https://github.com/thephpleague/glide/issues/153#issuecomment-242756953

emilv commented 7 years ago

One thing I have noticed is that some crawlers remove the signature parameter. They probably believe it to be a session key, probably because it matches:

Suggested solution:

paragonie-scott commented 7 years ago

If you're worried about / and want to use a URL-safe variant, look no further than https://github.com/paragonie/constant_time_encoding

emilv commented 7 years ago

@paragonie-scott or since we don't need constant-time encoding of the key we can just take the implementation from your test case: https://github.com/paragonie/constant_time_encoding/blob/aa342bff0f39b72da79b62b72d95a9bb86aa80b9/tests/Base64UrlSafeTest.php#L22 :-)

reinink commented 7 years ago

So while working on these changes last night I realized that this filename plan is actually going to be problematic.

http://example.com/img/kayaks.jpg/kayaks-w-800.jpg?s=71a61...

I realized this when implementing the watermarks functionality. Check this out:

http://example.com/img/kayaks.jpg/kayaks-w-800-mark-logos/company-logo.png.jpg?s=71a61...

That logos/company-logo.png in there causes all sorts of problems. There is a directory separator. There could be an unknown number of dashes, which makes parsing the whole thing impossible.

So I'm back to square one. The issue here really is windows systems. Windows allows basically no special characters in their filenames, so this needs to be done with dashes, underscores and slashes. The issue there is all three of these are valid watermark file values.

I think it's helpful to review the goals here:

And so, I propose the following filename and URL format.

http://example.com/img/kayaks.png/[signature]/kayaks.jpg?w=800

This is sort of what @cdowdy was suggesting earlier, with the addition of the manipulation attributes as query strings. That allows us to just use the normal sha256 hash for the signature, instead of trying to encode/decode this value.

This format should work perfect since the signature will be filename safe, and is then right away available for signature verification. The manipulation attributes are provided as query strings since you can include a lot more characters in query strings than filenames, and really these are only require when processing the image for the first time. 🎉

We could simplify it further:

http://example.com/img/kayaks.png/[signature].jpg?w=800

However I think trying to maintain a nice filename, as @jasonvarga was suggesting, is a good enough reason to have a slightly longer filename.

What do you all think? I feel like this just super simplifies things. Am I missing anything?

jasonvarga commented 7 years ago

Putting the manipulation params back into the query string - won't we lose the ability to have the server serve the images directly after they've been generated?

reinink commented 7 years ago

@jasonvarga I don't think so! We're still generating a valid filename that server can then server immediately after the first request. The only difference in what I'm suggesting is that the query strings will provide the manipulation attributes, but the filename will be a unique hash of those manipulation attributes, and the filename is all the web server is concerned about, since it ignores the query strings when serving image assets. It's like this:

The first request hits PHP, and PHP uses the signature to validate the request, and the query strings to generate the image:

http://example.com/img/kayaks.png/481d6624f9c.../kayaks.jpg?w=800

The second request hits the web server, and all it cares about is the filename, not the query strings, so pretend like they are not there:

http://example.com/img/kayaks.png/481d6624f9c.../kayaks.jpg

Since the signature is a hash of the manipulation attributes the web server will serve the correct cached image.

Make sense? :)

jasonvarga commented 7 years ago

Since the signature is a hash of the manipulation attributes the web server will serve the correct cached image.

This is what slipped my mind. The signature would be unique to the manipulations. Sounds good to me!

tmirks commented 7 years ago

Further thoughts about this, purely my own opinions. :)

Don't reinvent the wheel

  1. Has anyone looked at what imgix is doing? Their API is extremely robust and they have a wildly popular service. They've obviously put a ton of thought into this and have solved all of the problems Glide is trying to solve and more. A couple of examples:
    1. Expiring URLs allows the front-end developer to decide on the cache expiry on-the-fly. This is an elegant solution. The asset's cache headers are then set by the service and pushed back to the client. The expiry can't be changed by the end user because it's hashed into the secure key.
    2. Changing the format of an image is done by using the fm parameter, the original filename does not change.
  2. Inventing new ways of forming URIs and passing parameters should be suspect. The standards for this have been around for decades so maybe we should be looking at how we can adhere to the standards? See w3.org:

    The query string represents some operation applied to the object...

    1. We should be able to cache and load URIs with query strings at the web server level, I'd be happy to look into the solution here. Apache and nginx BOTH have caching modules that can be leveraged for this purpose and Glide doesn't even need to know about it. See examples for Apache and Nginx.
    2. We're using Glide in a production environment today and caching URIs with query strings using Amazon CloudFront. Works great!

@reinink what do you think about making Glide's API a clone of the imgix API? This could really generate a lot of interest in the broader community.

If you really don't like the idea, then at least...

Allow URIs to be decoupled from the backend Glide service

Glide seeks to solve a bunch of different problems, which is fine, but if it could be designed to be more flexible and modular we can make everyone happy. :)

At the end of the day, as long as Glide knows 1) the original object requested and 2) the manipulations required, it shouldn't really care what the URI looks like. That would make it compatible with any web server and caching system as implemented by the developer.

Maybe Glide would have sensible defaults that would work without any web server configuration.

Thoughts?

reinink commented 7 years ago

@tmirks Thanks for joining in the discussion. Some thoughts:

I totally love Imgix, and I use this service on a few larger projects of my own. Much of the Glide API has been based on Imgix. That said, I'm not sure that I'm prepared to make it a clone. For one there are tons of things that Imgix can do that Glide will never be able to do. And second, Glide is it's own project and I'm not sure I want to tie it that closely to Imgix. I'd like to have the freedom to do what is best for this project.

Glide actually doesn't try to solve that many things, and overall the codebase is remarkably small and simple. The goal with Glide has always been two fold:

  1. Provide users with a beautifully simple API for manipulating images
  2. Allow images to be generated on demand (via HTTP)

I think it does those two things very well, and it's become quite popular as a result. That said, the one concern that continues to pop up is that all Glide images must be served via PHP, which is exactly what I hope to solve in version 2.0. By being smart with how we generate the cached image filenames we can make it possible for web servers to serve the images after the first request.

At the end of the day this is still a PHP library. It's very important to me that it works in most standard PHP configurations with minimal setup. Yes, there are many things you can do with custom Apache and Nginx configurations, but I don't want them to be required. That will create too high of a barrier to entry, plus for many simple projects it's just totally not necessary. Further, I see absolutely no reason why Glide cannot work in both situations. As you've noted, I've already seen lots of users putting caching layers in front of Glide, which is awesome!

I think one of the coolest things about Glide is how much customization you can have over it. You can use the default URL generation and format, or you can absolutely create your own if you prefer. You can use local storage, or Amazon S3, or a mix. What you choose to do really depends on your particular project and needs. However, I do think it makes sense to have sensible filename and URI defaults, especially considering Glide markets itself as "HTTP based image manipulations".

How far we take Glide is always going to be a bit of a challenge. Consider expiring URLs. This is a neat idea, and I've considered it. However, to make this work each request would have to hit PHP in order to check this expiration value. That would mean serving cache images by the web server would not be possible. Sure, you could probably get the web server to help in this regard, but it would take custom configuration to do. I'd rather keep Glide simpler, and if your project requires expiring URLs, then Glide can easily be customized and your web server adjusted to support this. However, I don't think this is functionality that should be built-in by default.

Also, one last note. Glide does use the fm value to set the image format. And that works great right now, in the current version. You don't need to change the extension of the source image...just like Imgix. However, to make it possible to serve cached images by the web server the new file extension must be set, since web servers use this extension to determine the file type. That said, Glide will still use the fm value to set the file at the point of generation, since this can be anything and doesn't necessarily match extensions exactly. For example, we use pjpg for progressive jpegs, but those files actually get saved with the .jpg extension.

So I've rambled on for quite a while now. I'd like to just say that I'm 100% not trying to reinvent the wheel. However, I need to be real about what the value is that Glide provides. Folks use it because it's written in PHP. It works on almost any PHP based web server. It's easy to configure, and configuration is done in PHP, which developers are familiar with. It requires no custom web server configuration, and I believe that this is very important. The big goal with 2.0 is to simply make Glide work in a more performant way in the standard PHP web server stack.

cdowdy commented 7 years ago

Inventing new ways of forming URIs and passing parameters should be suspect. The standards for this have been around for decades so maybe we should be looking at how we can adhere to the standards? See w3.org: The query string represents some operation applied to the object... We should be able to cache and load URIs with query strings at the web server level, I'd be happy to look into the solution here. Apache and nginx BOTH have caching modules that can be leveraged for this purpose and Glide doesn't even need to know about it. See examples for Apache and Nginx.

Edit: Im not to versed in apache cache's so I may be talking out of my bum here below haha :)

@tmirks in Apaches instance (with mod_cache_disk) no different than using x-sendfile, with exception to using an actual disk or memory cache, since it uses well.. sendfile haha. Setting it up on the other hand seems a little more involved https://www.digitalocean.com/community/tutorials/how-to-configure-apache-content-caching-on-ubuntu-14-04

If you were to use mod_cache since query string would be used they would also act as a "cachebuster" which might necessitate https://httpd.apache.org/docs/2.4/mod/mod_cache.html#cacheignoreurlsessionidentifiers

Nginx from what I gather uses the nginx var or $request_uri so all arguments should be kept. I don't know why I didn't think of this earlier. So spitballing here... a regular try_files directive using $request_uri instead of $uri should work.

and I just found a post on using fastcgi cache and query strings: https://www.dougcodes.com/web-development/use-nginx-to-use-query-strings-while-caching

All of that seems like an inordinately amount of work just to serve a request to example.com/img/kayaks.png/481d6624f9c.../kayaks.jpg and a file saved to:

img (or whatever the cache is named)  
  |_ kayaks.png  
    |_ 481d6624f9c...  
      |_ kayaks.jpg  

Every web server if given the path of /img/kayaks.png/481d6624f9c.../kayaks.jpg can find that without any other configuration since it exists at that path.

Isn't the current Glide api similar to Imgix? Imgix uses [domain]/[path]?[query]

tmirks commented 7 years ago

@reinink you've done a great job with Glide and I hope I'm not coming across as critical! We rolled our own image server and subsequently scrapped it in favour of Glide because it solved all kinds of problems that we also wanted to solve. Also I completely understand the desire to have an out-of-the-box working platform especially for people who may not have access to web server configs.

One thing I do want to point out is that Expiring URLs would not require a hit to PHP. They would tell Glide what to set the Cache-Control: max-age and Expires headers to and those files would be cached just like any other file. Although if you had filesystem caching enabled you'd have to write a script to do cleanup on those files, right?

reinink commented 7 years ago

@tmirks Thanks Tim! And don't get me wrong, I very much appreciated your comments. It's good to step back and have a high level view at times like this. My long response was simply an indication that I've put a lot of thought into what I believe Glide should and should not be. It's great to hear that you've been able to use it with success!

As for the expiring URLs, I'm still not sure how that would work. Yes, the first hit to an image (when it's actually generated) would be done in PHP, and so those headers could be set then for that user, but all subsequent requests by any other users wouldn't hit PHP and the web server would send whatever the default headers are.

reinink commented 7 years ago

So I've decided that I am going support PSR-7 in version 2.0, but I'm going to do it a little different. Instead of all the response factory nonsense, I'm going to simply include a concrete PSR-7 implementation. And apparently Intervention/Image already has one in it's dependencies, so it looks like I'll be using guzzlehttp/psr7.

Configuration will be dead simple:

$glide = League\Glide\Server::create([
    'source' => __DIR__.'/storage/photos',
    'cache' => __DIR__.'/public/img',
    'sign_key' => 'my-test-key',
    'response_type' => 'httpfoundation|psr7' // default: httpfoundation
]);

And then to use, you can do this:

$glide->fromRequest()->response(); // as configured
$glide->fromRequest()->httpFoundation();
$glide->fromRequest()->psr7();

And what's nice is the fromRequest() method will take either a PSR-7 request, an HttpFoundation request, or null, in which case it just creates a request from globals.

$glide->fromRequest($httpFoundationRequest)->response();
$glide->fromRequest($psr7Request)->response();
$glide->fromRequest()->response();

I think this strikes a nice balance between supporting multiple request/response types while still keeping configuration to a total minimum.

Alexandra12312312 commented 7 years ago

@reinink I think Glide can be more better if it works like below: when I request 400x300 image it should check real image sizes and if this one is bigger than real ones , it should return real images.It happens on Google images.And developer should have option to do someting like this &force_resize=true, if this has been accepted by server and server shouldn't check image size and should just resize image.Or should have option to check it

reinink commented 7 years ago

@Alexandra12312312 Hey! You can actually do this already with Glide using the fit=max option. You can read about this in the documentation, but here is the description:

max: Resizes the image to fit within the width and height boundaries without cropping, distorting or altering the aspect ratio, and will also not increase the size of the image if it is smaller than the output size.

Alexandra12312312 commented 7 years ago

@reinink ok I have test this with fit=max option but it doesn't worked for example, when I send request with w=104&h=54 it returns image with 84x54 which is wierd because image itself 736x489.So feature that I said before is that it check image size if image width or image height is less than requested width or height it return image itself