Install and configure dynamic image resize and deprecate image resizing processes

brendanheywood commented 9 years ago

High level:

[x] original images should be stored, but never served directly
[x] all images served over the web should go through a dynamic resize pipeline and cached, allowing us to save a lot of disk wasted by pre-emptive resizing, and allow us to use new sizes on the fly without any processing at all
[ ] all image serving is done with general best practice, no cookies, proper cache headers, well tuned compression, dynamic content negotiation etc
[x] rotation is done dynamically as serving time, not upload time. This fixes the root cause issue we currently have where we can't far future expires an image as it may be rotated.

Current tool of choice is:

http://thumbor.org/

Rough todo list:

[x] install thumbor, served off static
[x] change upload process for topos and photos to just dump a photo into a storage directory and that's it.
[x] the process then blindly refers to the thumbnail url using the thumbor url api, which won't exist to thumbor goes off, makes it, then serves it
[x] if the user thinks the url needs rotating, they click clockwise or anticlockwise, we tore this just as metadata and don't touch the image, then swp the url to the new url
[x] the new rotated url doesn't so thumbor does everything again, but it's a different url so both are cached
[x] This also means that we can safely rotate the image at any time in the future, and all versions, sizes and rotatedness will all be cached both on disk and in the browser
[x] the data that gets sent to the templates should be the rotated dimensions so the templates don't need to worry about rotation
[x] make a webhelper function to handle the url creation. Needs to handle cases like smart cropping of a rectangular image down to a square thumbnail

scd commented 9 years ago

I think this is a priority. I want to move everybody onto virtual servers on the Internet backbone rather my LAN, and I want to have the image issues resolved before I do this.

Couple more task items:

[x] Need to store the max width and max height and pass this to the templates so templates know the image maximum sizes.
[x] Create config for standard sizes and make this available through the template global variable rather than a template parameter.

brendanheywood commented 9 years ago

I've very conscious that sorting our image serving is by far the # 1 performance bottle neck and eclipses anything else we could do anywhere else in the stack. That said we have a very large amount of code not yet shipped which I think we should release first. When you say 'everyone' are you just talking about us dev's or is there anything production which has a dependency on your lan?

scd commented 9 years ago

When I say priority, I mean after the current release. So I align with your opinion.

No production relies on dev. Timing dependency is when Nicky needs his own dev then I will build him a Internet backbone dev rather then my Lan dev. The image stuff will have to be in place by then so dev can work off an independent image storage. Maybe we can use static with a prod image store and a dev image storage so we only need one large disk for all our devs and prod.

scd commented 9 years ago

Thinking out aloud of an easy implementation strategy

[x] install thumbor on dev & static
[x] allow direct access to original images via web
[x] change template code to insert thumbor static address
[x] note that I am pretty sure rotation info is stored in meta data anyway, so nothing to do here
[ ] set up load distribution for thumbor image serving, if necessary - see thumbor doco

We should now be accessing some performance benefits. But work could continue as follows:

[x] Investigate the PDF process, which uses images directly as files rather than via the web. Port all PDF use of images to be via web/thumbor.
[ ] Remove generation of pre canned images.
[ ] Remove pre canned images so we now only have an originals directory
[x] Enhance upload process to store images directly on static, thus removing the requirement of images on prod & dev all together. This means we need a dev and prod image repository on static.
[x] Consider using this opportunity to investigate a different way of storing the images. Are there still any directory limits for files. If not then maybe we should collapse all files to one directory.
[ ] Consider porting the image serving component of static to a server with better access to the Internet backbone.

brendanheywood commented 9 years ago

Something to consider is swapping to a content addressable way of storing files. This is how git, dropbox, moodle and almost every other big store / cdn works

https://en.wikipedia.org/wiki/Content-addressable_storage

ie this is what the moodles internal file store looks like:

brendan@silverbirch 09:21:40 /var/lib/sitedata/vanilla-moodle/filedir $ find |less
.
./d0
./d0/2d
./d0/2d/d02dc787583d8b59bdd6fb045c344708ccefaba4
./d0/be
./d0/be/d0be22d2cb1240b0142d5692b992a28c2dec1a04
./d0/3d
./d0/3d/d03d73a6237f9e8082500d9ebe59de01e9a76b4c
./7d
./7d/3a
./7d/3a/7d3ac1f20e669d5dc1c60704dbcbbb6a65a013e8
./44
./44/56
./44/56/4456a4614ac765b4135d339e055f97b2f7230b2f

and the db model (just a small subset):

 id |               contenthash                | filepath |    filename    |  mimetype  
----+------------------------------------------+----------+----------------+------------
  1 | 41cfeee5884a43a4650a851f4f85e7b28316fcc9 | /        | background.jpg | image/jpeg
  3 | fb262df98d67c4e2a5c9802403821e00cf2992af | /        | smile.png      | image/png
  5 | a4f146f120e7e00d21291b924e26aaabe9f4297a | /        | sad.png        | image/png
  6 | 33957e31ba9c763a74638b825f0a9154acf475e1 | /        | tick.png       | image/png
  7 | d613d55f37bb76d38d4ffb4b7b83e6c694778c30 | /        | cross.png      | image/png

Every file is stored as the sha1 hash of it's contents. This has many benefits like automattic de-duplicating at the store level. And it makes it much easier to shard the file system across multiple servers or mounts later, or re-shard as the content grows without touching the file metadata (eg we don't need the 'volume' concept any more)

scd commented 9 years ago

FYI, I have installed thumbor on static

http://static.thecrag.com:8888/unsafe/300x200/http://www.thecrag.com/image/photo/00003/modified/724238361.jpg

out of the box install without any config. Note I had to install about three undocumented obscure dependencies.

apt-get install python-pip python-dev build-essential pip install --upgrade pip # note that this removed pip from path so I needed to do sh to continue pip install --upgrade virtualenv apt-get install libcurl4-openssl-dev pip install --upgrade setuptools pip install thumbor thumbor&

Notes:

there may be some extra compression stuff to install (pngcrush and gifsicle ???)
review config file and setup properly (/usr/local/bin/thumbor-config > /etc/thumbor.conf, then run with config file opiton)
keep running and run on startup - install supervisord (????)
load balance (Nginx???)

brendanheywood commented 9 years ago

Ok some rough general todos:

[x] use original not modified: http://static.thecrag.com:8888/unsafe/300x200/http://www.thecrag.com/image/photo/00003/original/724238361
[x] configure the filesystem loaded instead of the http loader, and graft the route onto static (via /t/ ?) https://github.com/thumbor/thumbor/wiki/Configuration#file_loader_root_path)

http://static.thecrag.com/t/300x200/photo/00003/original/724238361

[x] test it still all works via https

https://static.thecrag.com/t/300x200/photo/00003/original/724238361

[ ] get rid of safe mode and get client running on dev to sign the urls https://github.com/thumbor/thumbor/wiki/Configuration#security_key

https://static.thecrag.com/t//cryptohash300x200/photo/00003/original/724238361

[ ] get AUTO_WEBP = true serving turned on

https://static.thecrag.com/t/blahblahblah/300x200/photo/00003/modified/724238361.jpg

[ ] get cahe and expire headers configured right, at the moment isn't far futures and has caching off, assume this is due to the unsafe mode stuff so may just start working right after that's fixed

[ ] I don't mind face detection on for avatars, I'm on the fence for the general climbing pics (probably off), but they should definitely be off for the topos. May be better to just turn them all off for the first rollout

[ ] check MAX_WIDTH and MAX_HEIGHT, we will be sending some big images some times (find a big image to test)

[x] given that rotate is just another filter, and it's desirable to give access to the templates to use other filters (blur, colorize, rounded corners, there are a bunch I had immediate ways to use in different places), it seems that the addition of the rotate() has to be done in the templates rather than the backend sending a preformed url

scd commented 9 years ago

and graft the route onto static (via /t/ ?)

I am a little confused at how we do this. In apache land and redirect to the thumbor port, or use something like Ngnix as front end, or just use the python Tornado webserver for all static web traffic. Or have I missed something alltogether?

it seems that the addition of the rotate() has to be done in the templates rather than the backend sending a preformed url

I am a little confused with what you are saying.

We needed to rotate because EXIF data did not always align with what looked like the natural orientation. In this case the user did the re-orientation, which was stored as an EXIF override meta data setting.

IMO we should not adjust the original for any reason, this includes correcting bad EXIF data. Instead we should store additional rotate meta data in the database model and pass this to the templates. Is this what you meant by 'done in the templates'?

There is design choice to make here. Given that we have to store rotate meta data in the db model, do we disable EXIF on the server (RESPECT_ORIENTATION) and store all EXIF in the db model, or do we enable EXIF on the server and just store the EXIF delta? (I vote for the former.)

scd commented 9 years ago

I am not sure if this will be useful, but what if we had a subdomain image.thecrag.com (which pointed to static) and then redirected everything on image.thecrag.com to static.thecrag.com:8888

Here is an example of apache config:

http://serverfault.com/questions/195611/how-do-i-redirect-subdomains-to-a-different-port-on-the-same-server

brendanheywood commented 9 years ago

Comments below

and graft the route onto static (via /t/ ?) I am a little confused at how we do this. In apache land and redirect to the thumbor port, or use something like Ngnix as front end, or just use the python Tornado webserver for all static web traffic. Or have I missed something alltogether?

It's trivial, google apache mod_proxy. We never want redirect only internal forwarding

it seems that the addition of the rotate() has to be done in the templates rather than the backend sending a preformed url

I am a little confused with what you are saying.

We needed to rotate because EXIF data did not always align with what looked like the natural orientation. In this case the user did the re-orientation, which was stored as an EXIF override meta data setting.

IMO we should not adjust the original for any reason, this includes correcting bad EXIF data. Instead we should store additional rotate meta data in the database model and pass this to the templates. Is this what you meant by 'done in the templates'?

Yup, pass the data to the templates so it can rotate and optionally apply other filters as well

There is design choice to make here. Given that we have to store rotate meta data in the db model, do we disable EXIF on the server (RESPECT_ORIENTATION) and store all EXIF in the db model, or do we enable EXIF on the server and just store the EXIF delta? (I vote for the former.)

Don't really have an opinion. My only requirement is that the exit data in the file is honoured as the default which we can then swivel

brendanheywood commented 9 years ago

No redirects they are a performance killer.

We should be going the opposite direction, turn on spdy and http2 and then turn off static completely.

Sent from my iPhone

On 04/10/2015, at 2:56 PM, Simon Dale notifications@github.com wrote:

I am not sure if this will be useful, but what if we had a subdomain image.thecrag.com (which pointed to static) and then redirected everything on image.thecrag.com to static.thecrag.com:8888

Here is an example of apache config:

http://serverfault.com/questions/195611/how-do-i-redirect-subdomains-to-a-different-port-on-the-same-server

— Reply to this email directly or view it on GitHub https://github.com/theCrag/website/issues/2029#issuecomment-145315678.

scd commented 9 years ago

From Wikipedia: "As of February 2015, Google has announced that following the recent final ratification of the HTTP/2 standard, support for SPDY would be deprecated, and that support for SPDY would be withdrawn completely in 2016"

Are you suggesting that http2 will remove one of the requirements that got us to use static. There are other reasons to have a static server, but these will probably go over time as well.

brendanheywood commented 9 years ago

Yup SPDY was the basis for the start of the http2 spec which replaces it. Between them both support is great at around 70% (using our GA stats with caniuse). Every evergreen browser supports it.

http://caniuse.com/#feat=spdy

http://caniuse.com/#feat=http2

And yes the primary reason why you want a static server is cookieless domains, and domain sharding. Both of these are solved by the new specs and once we turn them on became an anti feature. The other things you get from a full cdn are early ssl termination, edge serving, etc which would be nice too but aren't a big deal for us until we grow a bunch more

And I am completely agnostic about to underlying server architecture, the only thing I care about here is the client facing services. We could still have thumber being served by static under the hood and reverse proxied from www.thecrag.com, but honestly given it's current bandwidth constraints I think thats not ideal.

scd commented 9 years ago

Whenever we want, don't we just turn on http2 on in apache in prod. I presume our version of apache supports this mod. Let's just get thumbor image serving working first, I don't want to many distractions.

scd commented 9 years ago

@brendanheywood do we want a separate repository for dev and prod images, or have one for both. We need to solve images issue in dev as well as prod. If we just have one image repository for both dev and prod then this will save a lot of system resources.

Also in reference to your response for EXIF rotation.

Don't really have an opinion. My only requirement is that the exit data in the file is honoured as the default which we can then swivel

If we do it this way then we must store the rotation change between EXIF and what the user selected. Currently we store the absolute as an EXIF override. We will have to test this thoroughly to make sure thumbor behaves as I would expect it too.

brendanheywood commented 9 years ago

I think we could easily implement a custom loader in dev so it looks for images locally, an then if it doesn't find it then it goes over the network or whatever and grabs the image from prod. So we'd have a dev and prod thumbor so we can play with dev config safely but store almost nothing.

https://github.com/thumbor/thumbor/wiki/Image-loader

scd commented 9 years ago

@brendanheywood I have come across a blocking issue with something that I thought should be a trivial piece of work. I am going crazy trying to work out why I cannot set https for thumbor. There is nothing in the documentation. Am I missing something really obvious????

The way around this is to just do this in the proxy https -> http. This works, but I am guessing could be a security hole.

https://static.thecrag.com/img/unsafe/300x200/http://www.thecrag.com/image/photo/00003/modified/724238361.jpg

http://static.thecrag.com/img/unsafe/300x200/http://www.thecrag.com/image/photo/00003/modified/724238361.jpg

apache config: /etc/apache2/sites-enabled/003-cids-static thumbor config: /etc/thumbor.conf

scd commented 9 years ago

I think we could easily implement a custom loader in dev so it looks for images locally, an then if it doesn't find it then it goes over the network or whatever and grabs the image from prod. So we'd have a dev and prod thumbor so we can play with dev config safely but store almost nothing. Yes but for the sake of a couple of spurious images do we need to write and maintain a custom image loader?

We are going to be loading through the filesystem, so I was thinking we could have something like /opt/image-store/thecrag /opt/image-store/thecrag/dev One way of doing this is a custom loader would try and load /opt/image-store/thecrag/dev first then if not there load /opt/image-store/thecrag.

Another way is to have our own local thumbors which tried local filesystem, if failed went to prod thumbor.

Alternatively we could generate a hash file key of something like d02dc787583d8b59bdd6fb045c344708ccefaba4-dev for images uploaded from dev whic is stored in a file ./d0/2d/d02dc787583d8b59bdd6fb045c344708ccefaba4-dev. This way dev and prod would look exactly the same, which will keep the code base small. I don' have to write and maintain a custom loader. We can still go through the image store and remove dev images if we wanted to. I am leaning to this method because it has the smallest code base. Downside is that dev need access to Internet to get images.

brendanheywood commented 9 years ago

Re ssl, I would have done ssl termination in apache and proxy to thumbor exactly as you have done. This is standard, no security issues

Re custom loader, not too fussed but the "tried local filesystem, if failed went to prod thumbor" method I think is the simplest. The custom loader will be 20 lines of python so I don't see that as a big deal, just munge the file system and http loaders together

https://github.com/thumbor/thumbor/blob/master/thumbor/loaders/file_loader.py

scd commented 9 years ago

How important is it to sign the urls. As far as I can see there are a couple of reasons why we would want to do this.

Stop URL tampering, which may lead to denial of service type attacks.
Protect access to sensitive information (eg personal photos that should not be publically viewable without the URL being generated from us).
Level of protection for topo images. If you don't have the security key you cannot get a topo image.
Sounds like good practice.

https://github.com/thumbor/thumbor/wiki/Security

I ask because it potentially has implications on the client side. The whole url plus arguments must be encoded in the hash. This means the template must be doing this on the fly or javascript must be doing this on the fly where images are being brought in by javascript. Actually the javascript is a problem because it will expose the security key, so we need to pass in the whole encoded url from the template, which means we can only use pre-defined arguments for images in javascript, or write a service to encode arguments.

Is it worth the effort for the initial port to thumbor?

brendanheywood commented 9 years ago

1) is the main driver. If its not an issue then probably doesn't matter

However long term I think we should sign so we could go live without and migrate all urls to images to server side stuff and then turn on later

scd commented 9 years ago

I think 1) is potentially an issue, but there are a lot of other use cases on the server with this issue. So I think we plan to release without signing. If I get the urge I will put on signing, otherwise we can do it for a future release.

Given that I have taken the word 'unsafe' out of the static proxy url. We do not want the word unsafe being seen by our customers. So access to images will now look something like:

https://static.thecrag.com/img/300x200/http://www.thecrag.com/image/photo/00003/modified/724238361.jpg

FYI, I have decided to bite the bullet and learn how to branch with git. This work is on the thumbor-int branch. The config file changes have been pushed to repo.

Also, just thinking out aloud, if somebody is embedding a theCrag photo in a discussion then I presume we just call the service with no arguments and let thumbor worry about the size???

https://static.thecrag.com/img/http://www.thecrag.com/image/photo/00003/modified/724238361.jpg

scd commented 9 years ago

Found this:

http://blogs.bethel.edu/web-services/2014/04/the-responsive-image-challenge-part-2/

Have you ever used imager.js or something like http://placehold.it/ ?

scd commented 9 years ago

In this comment I am thinking out aloud, but feel free to put in your views.

The db model currently has a few tables which hold meta data about images. Actually the meta data currently the width and height of the various dimensions (NULL if not available). The image id comes from the photo record ID (or topo record id, etc) and is what is used in the filesystem.

Note that there use to be a 'Resource' table which maps fairly well to the concept of an asset.

I think we now want a single original image table that is referenced by each of the Photo, Topo, etc tables. This Image table stores the

Original Width
Original Height
Rotation
Hash

I don't want to go back to the Resource table because it does not store the width and height.

We need to know the width and height of the original so that we do not request images from thumbor which are bigger than the original and/or force cropping unless we want it cropped.

We then need some helper functions which take the original height and width and create standard sizes. For example if we want images which are always 100 pixels wide but not cropped then we have to work out the target height for thumbor without cropping. The helper functions should possible just replicate the standard dimensions initially. This means we can cutover functionality.

scd commented 9 years ago

I updated Rotation. We need to know any final user rotation required after upload.

brendanheywood commented 9 years ago

Also, just thinking out aloud, if somebody is embedding a theCrag photo in a discussion then I presume we just call the service with no arguments and let thumbor worry about the size???

https://static.thecrag.com/img/http://www.thecrag.com/image/photo/00003/modified/724238361.jpg

We should parse it and rewrite it back to the size we want, the size we want should be determined by the markdown output and not by the author, ie in some context would want small, in others big etc.

Also can you rewrite out the second www part so instead of this:

https://static.thecrag.com/img/300x200/http://www.thecrag.com/image/photo/00003/modified/724238361.jpg

it's just this:

https://static.thecrag.com/img/300x200/image/photo/00003/modified/724238361.jpg

or possibly even this: (.jpg may be wrong if retyped to webp on the fly)

https://static.thecrag.com/img/300x200/photo/00003/original/724238361

(we should only ever be dealing with scaled versions of 'original' and never 'modified' etc, the small versions will all be removed??)

"The db model currently has a few tables which hold meta data about images."

Yeah the internals is all up to you. Don't know it well enough to comment. I didn't think it needed to change that much really, we could still have the 'sizes' array but it would only ever contain an 'original' size and all the others would get removed. The only extra data model field would be the rotation.

We then need some helper functions which take the original height and width and create standard sizes. For example if we want images which are always 100 pixels wide but not cropped then we have to work out the target height for thumbor without cropping. The helper functions should possible just replicate the standard dimensions initially. This means we can cutover functionality.

I think you get that all out of the box with thumbor, eg:

https://static.thecrag.com/img/300x200/http://www.thecrag.com/image/photo/00003/modified/724238361.jpg

vs

https://static.thecrag.com/img/300x/http://www.thecrag.com/image/photo/00003/modified/724238361.jpg

scd commented 9 years ago

Agreed.

Thanks for the heads up with the non-cropping resize for thumbor. This will definitely make the helper function somewhat lighter.

brendanheywood commented 9 years ago

Found this: http://blogs.bethel.edu/web-services/2014/04/the-responsive-image-challenge-part-2/

I missed this comment before: there are heaps of different script that do stuff like this, I think this is more or less a duplicate of #1761 (lazy defer load)

Have you ever used imager.js or something like http://placehold.it/ ?

Re imager.js yes, also native srcset is coming along nicely too (at 42% http://caniuse.com/#feat=srcset)

Re placehold.it yes but not sure what you have in mind for this? Typically these types of services are used for mockups by design agencies before real content has arrived. (I like https://placekitten.com/)

If you are thinking of having something load in the interim until the real image loads then I wouldn't use yet another image, I'd do it as very lightweight inline html / css, like just a grey box with the alt text visible which would then get replaced by the image when you scrolled down andor it loaded etc.

scd commented 9 years ago

I have added some extra stuff to the DB Model.

+---------------------------+---------------------+------+-----+---------+-------+
| Field                     | Type                | Null | Key | Default | Extra |
+---------------------------+---------------------+------+-----+---------+-------+
| CreateDate                | int(11)             | NO   | MUL | NULL    |       |
| ID                        | bigint(20)          | NO   | PRI | NULL    |       |
| RecordOrder               | bigint(20) unsigned | NO   |     | NULL    |       |
| Height                    | double              | NO   |     | NULL    |       |
| Width                     | double              | NO   |     | NULL    |       |
| CreatedBy                 | bigint(20)          | NO   | MUL | NULL    |       |
| HashID                    | varchar(255)        | NO   | MUL | NULL    |       |
| Rotate                    | double              | YES  |     | NULL    |       |
| RecordStatus              | tinyint(1)          | NO   |     | NULL    |       |
| LastUpdated               | int(11)             | YES  |     | NULL    |       |
| MimeType                  | varchar(255)        | YES  |     | NULL    |       |
| Filename                  | varchar(255)        | YES  |     | NULL    |       |
| Path                      | varchar(255)        | YES  |     | NULL    |       |
| FieldOfView               | varchar(255)        | YES  |     | NULL    |       |
| GPSDirection              | varchar(255)        | YES  |     | NULL    |       |
| GPSPoint                  | varchar(255)        | YES  |     | NULL    |       |
| DepthOfField              | varchar(255)        | YES  |     | NULL    |       |
| TimeTaken                 | int(11)             | YES  | MUL | NULL    |       |
| EXIFOrientationCorrection | double              | YES  |     | NULL    |       |
| CameraMake                | varchar(255)        | YES  | MUL | NULL    |       |
| CameraModel               | varchar(255)        | YES  | MUL | NULL    |       |
| EXIFOrientation           | double              | YES  |     | NULL    |       |
+---------------------------+---------------------+------+-----+---------+-------+

MimeType as detected. Filename as uploaded. Path as you suggested. I am not sure if we will end up using these as the images will be served by Thumbor which will not have access to the db model, just the filesystem. Anyway they are here for completeness.

EXIFOrientation is the Orientation stored in the EXIF data. From time to time the user rotates the images when they are uploaded, in which case this is stored in the EXIFOrientationCorrection field. Note that Thumbor has been set up to honor the EXIF Orientation data, however it does not know about the EXIFOrientationCorrection value. This is where the Rotate field comes in, which tells the template to set a rotate value (90,180 or 270) so that the image is rotated from the EXIFOrientation to the EXIFOrientationCorrection. (Note that this is a delta rotation from the already encoded EXIF data in the original image). This must be tested thoroughly.

The GPSPoint, GPSDirection, FieldOfView, DepthOfField, TimeTaken, CameraMake and CameraModel fields are all deduced from EXIF data. We had previously discussed and implemented this directly onto the Photo and Topo tables. I am moving them to the new image db model, so it collects it for all image uploads.

brendanheywood commented 9 years ago

GPSPoint, GPSDirection, FieldOfView, DepthOfField, TimeTaken, CameraMake

hey that's a nice bonus :) will this get retro extracted from the existing body of photos? I have some quick thoughts on that which I'll knock up in a new issue...

scd commented 9 years ago

Yup, will be retro populated. Actually it was already in the photo and topo fields. It is way better in a single common table. On Oct 7, 2015 2:22 PM, "Brendan Heywood" notifications@github.com wrote:

GPSPoint, GPSDirection, FieldOfView, DepthOfField, TimeTaken, CameraMake

hey that's a nice bonus :) will this get retro extracted from the existing body of photos? I have some quick thoughts on that which I'll knock up in a new issue...

— Reply to this email directly or view it on GitHub https://github.com/theCrag/website/issues/2029#issuecomment-146068338.

scd commented 9 years ago

Some test use cases for rotation

http://www.thecrag.com/photo/183901959 # EXIF 6, corrected 1, Rotate 270 http://www.thecrag.com/photo/186275373 # EXIF 1, corrected 6, Rotate 90 http://www.thecrag.com/photo/275428440 # No EXIF, corrected 6, Rotate 90 - note that this is an avatar that has been intentionally rotated. http://www.thecrag.com/photo/25060522 # EXIF 6, no correction, no rotation

scd commented 9 years ago

Ahh, I assumed Thumbor rotates clockwise, it is anti-clockwise. Also there are some older images which we did not rotate in accordance with the EXIF. For example the following image has an EXIF Orientation of 6, but is stored as if it had no Orientation:

http://www.thecrag.com/photo/103356465

When I regenerated the image it came good. This implies there was an historical bug in my code somewhere. The new process will automatically correct a lot of examples, but the inconsistency worries me a little. We will have to see how the end result looks.

New set of test examples and results:

http://www.thecrag.com/photo/183901959 # EXIF 6, corrected 1, Rotate 90 http://www.thecrag.com/photo/186275373 # EXIF 1, corrected 6, Rotate 270 http://www.thecrag.com/photo/275428440 # No EXIF, corrected 6, Rotate 270 - note that this is an avatar that has been intentionally rotated. http://www.thecrag.com/photo/427087752 # EXIF 6, no correction, no rotation

scd commented 9 years ago

tested the above and am happy that the rotation algorithm works as I expect it to.

I think we will have to allow users to rotate images after they have been uploaded.

scd commented 9 years ago

Next phase involves integration. Here is a rough task list:

Config

[x] create a config variable which allows us to run in stored, duel and thumbor mode. This way the site can run with both image stores simultaneously during a transition period. At a minimum we will need this while we are setting up.
[x] Helper functions and imageUrl template

Upload

[x] x6 adjustments to code around storeUnprocessedOriginal. 5 in the process functions and one in the API. grep to find.
[x] The changes must work in various modes.
[x] save to static as well as prod.
[x] Topo & Photos can be rotated on upload. Note that there may be some deeper issues to check when selecting an existing photo as a topo (the existing photo may already have a rotation).
[x] Replace check-image.js on upload so it uses the thumbor image (no checking required). Rotation still needs to work, so we will need some features of this file.
[x] API/Image.pm transform needs to update the rotation field. Should work in both modes.
[ ] The storeUnprocessedOrginal and post-process-image-upload should not be run in thumbor mode.
[ ] regenerate_image and post-process-image-upload should have fail status and exit on error in thumbor mode
[ ] document any changes required to ppst-process-image-upload cron after going to thumbor mode

Prod Loader

[ ] if http://www.thecrag.com/ then get http (eg scale static topo). White list to just thecrag prod.
[ ] there may be a problem using thumbor to scale static topos - caching needs to be turned off. Investigate this before starting any work on scaling static topos using thumbor.
[x] else load from static filesystem

Dev Loader

[ ] if http then get http (as per prod, but allow any http)
[x] if in local filesystem then get locally
[x] go to static to load via http.

PDF

[x] Change the PDF generation from using filesystem images to web images. THIS SHOULD BE DONE FIRST IN PREPARATION TO THUMBOR INTEGRATION
[x] Change the PDF generation to go via tumbor when thumbor mode is enabled
[x] Review any changes to static topos via phantom.js

Data

[x] extra data (rotation,hash) should be integrated into the template data for images associated with photos, topos, publications, business directory and sponsor
[x] proxy dev thumbor as well as prod so we can terminate https
[x] avatar endpoint needs conversion
[ ] topo template need extra fields passed into it which go to the imageUrl (max dimensions, rotation). REVIEW all template calls of phototopo
[x] photo facet
[x] topo facet
[x] streams error with missing image template data
[x] topo page
[x] static topo ???
[ ] assign your avatar image process

Web templates

[x] Create a web template helper function for generating image urls (all static and thumbor modes)
[x] There are about 30 changes required in the templates to use the web helper function - grep image * | grep volume in various template directories should get them all

Delete

[x] soft delete - verify no change
[ ] hard delete - only allow if there are no other internal references to the image (for example the image may have been uploaded twice or uploaded as a photo then linked as a topo). All internal references need to be soft deleted first.
[x] remote hard delete - use app server to do this real time.

Thumbor config

[x] dev thumbor needs increased timeouts as the defaults are not long enough to get many images through my ISP (eg facet search).

scd commented 9 years ago

Specific PDF stuff:

[x] sponsor and business directory images to be changed to helper function
[x] our in-house dynamic image for photos to be changed to helper function (not that we need this to seemlessly swicth over from in-house to custom with a flag change).
[x] topo stuff needs a bit of tidying up. We only use the the static topo images for PDFs now which calls the TopoSVGCache::applicationTopoXMLWrapper3 function. Comment out old code in the template and make sure the fuction calls the new helper function for generating urls.

brendanheywood commented 9 years ago

I'll mention this now as I'm not sure whether it's a distraction or would be more convenient to implement now, but with the topos it would be much better if the topo tool rendered exclusively as some large size, probably modified or it's equivalent, and then the output topo image is then fed into thumbor to be resized as needed. This may mean thumbor gets used twice, but will mean that the topos will always come out looking consistent when scaled. (ie no really fat looking lines on small topos)

scd commented 9 years ago

Later, distraction. Will topo lines be ok being reduced a second time given they will not be vector graphics? On Oct 9, 2015 7:01 PM, "Brendan Heywood" notifications@github.com wrote:

I'll mention this now as I'm not sure whether it's a distraction or would be more convenient to implement now, but with the topos it would be much better if the topo tool rendered exclusively as some larger know size, probably modified or it's equivalent, and then the output topo image is then fed into thumbor to be resized as needed. This may mean thumbor gets used twice, but will mean that the topos will always come out looking consistent when scaled. (ie no really fat looking lines on small topos)

— Reply to this email directly or view it on GitHub https://github.com/theCrag/website/issues/2029#issuecomment-146790396.

scd commented 9 years ago

Also, we can just make it bigger, but use the same width in cm. This will do the same thing. On Oct 9, 2015 8:59 PM, "Simon Dale" scd@thecrag.com wrote:

Later, distraction. Will topo lines be ok being reduced a second time given they will not be vector graphics? On Oct 9, 2015 7:01 PM, "Brendan Heywood" notifications@github.com wrote:

I'll mention this now as I'm not sure whether it's a distraction or would be more convenient to implement now, but with the topos it would be much better if the topo tool rendered exclusively as some larger know size, probably modified or it's equivalent, and then the output topo image is then fed into thumbor to be resized as needed. This may mean thumbor gets used twice, but will mean that the topos will always come out looking consistent when scaled. (ie no really fat looking lines on small topos)

— Reply to this email directly or view it on GitHub https://github.com/theCrag/website/issues/2029#issuecomment-146790396.

brendanheywood commented 9 years ago

Should be, needs testing and tuning

scd commented 9 years ago

If we want to pass our static topos through thumbor we will have to write a custom loader for dev and for prod.

In prod:

[ ] load images from file system
[ ] if url specified (as for topos) load from http

In dev

[ ] load from file system if file system hash not present then get from prod thumbor
[ ] if url specified then get from url

scd commented 9 years ago

It is way more convenient if I do this release in two stages. First stage will just build the hash filesytem in parallel to the existing filesystem. This will allow me then to test the cutover on dev without having to find more disk space somehow. ie images which are not here locally go to static.

I will try and kick off stage one early this week.

scd commented 9 years ago

Interesting minor complication with hard deletes of images. Because the new image filesystem storage will store the same image at the same address (ie the content is the same the filename will be the same) this means that if somebody uploads a photo twice there will be one image stored, but two photo records pointing to the same image record/file. Also if somebody makes a topo from an existing photo, then there will be both a photo and topo record pointing to the same image record/file.

Soft deleting should work in exactly the same way by changing the status of the master Photo/Topo record.

Because of the possibility of multiple references to one image record/file, hard deleting needs to take into an additional consideration. Hard deleting should soft delete plus remove the link from the photo and if there are no more references to the image record/file then remove that as well.

theCrag / website

Install and configure dynamic image resize and deprecate image resizing processes #2029