Closed konklone closed 10 years ago
Specifically, I'm proposing 4 directories:
congress/original
congress/200x250
congress/100x125
congress/40x50
I moved the resize_photos.sh
script from congress-legislators
into this repo, which does this really well if you run it from a directory with a bunch of photos in it. It can be adapted pretty easily to work for this.
It'd be nice if the main Python script kicked this off (or did it itself) after checking for new photos, so that people don't have to remember to run two scripts to keep the photos up to date.
Originals added in https://github.com/unitedstates/images/commit/d293cac8db12ba43540d626987680ec44f518cbb
For example: http://theunitedstates.io/images/congress/originals/A000055.jpg
Resized to follow.
Oh, just spotted the directory is called originals
and you proposed original
. Shall I rename it?
Nice! :) Er, sure, if it's not much trouble, might as well rename it, since the other dirs are adjectives and not plural nouns.
All right, I've beefed up the docs, and updated the script to git clone
the YAML rather than download it through the raw.github.com
URL.
I've also updated our old contribution page to link here, and added a section to the README on non-GPO contributions (which I hope are rare).
Before we update our Congress API docs to link to this instead of our old zips, I'd like to make sure that we have original
, 200x250
, 100x125
, and 40x50
versions for each MoC, and we retain the same breadth of photos.
We currently have photos in our Sunlight collection for members back to at least 2007 (the 110th Congress). So we should collect photos for the 112th, 111th, and 110th first. Presumably, the cache means it will only bother to download photos for members we don't already have.
Also - if the originals are consistently in the 500+x700+ range, I'd be up for revisiting our supported sizes, and putting some burden on clients to downsize where needed in-browser or in-app.
For example, maybe just:
original
400x500
or 500x625
100x125
Why are the directories the sizes and not the bioguide IDs?
It would seem to me that this would be more useful: http://theunitedstates.io/images/congress/A000055/original.jpg http://theunitedstates.io/images/congress/A000055/400x500.jpg http://theunitedstates.io/images/congress/A000055/100x125.jpg
Unless of course you're worried about having descriptive filenames...
Also, since the original images are not always guaranteed to be the same size, maybe the size should only refer to a single axis (e.g. width) so that you're not distorting (or misrepresenting) the aspect ratio of the image?
https://github.com/unitedstates/images/commit/aa5b395a1e5f8b1c039d2e0512868552150b9f16 kicks off resizing from the script and adds the resized photos, using the initial 200x250/100x125, 40x50 for now.
@GPHemsley:
About directories, I think original/A000055.jpg
is better than A000055/original.jpg
. For one, the downloaded image has a more meaningful and unique name -- I can check the size from the image itself, but not the ID. If I download lots, I don't need to rename hundreds of original.jpg
files as I go along. The other image APIs I can think of off the top of my head use size/id.jpg
: Flickr (who actually use size/id_size.jpg
) and Last.fm. Also it's a bit easier to implement this way :)
About sizes, Flickr only refers to the longest size (unless they're square sizes). I think Last.fm do something similar, and both have labels for image sizes.
Anyway, the ImageMagick resize command is using the ^
fill area flag along with -extent
to remove excess padding from the resized image:
convert $BASEDIR/original/$f -resize $SIZE^ -gravity center -extent $SIZE $BASEDIR/$SIZE/$f;
@konklone:
About sizes, the originals are currently sized:
So most are indeed 500+x700+. It's not terribly hard to resize images on our side, so I'm for having having a good selection. Let me know if any sizes should be added/removed.
About going back to 2007: memberguide goes back to the 110th and we have a switch to choose the number. Already downloaded images won't be overwritten. Cached member pages won't be replaced or reloaded from the web, but different congress sessions have different pages and even photos:
So for example, we'll download the 113 page as cache/113_RP_Aderholt
. We'll later download the 110 page as cache/110_RP_Aderholt
because we don't know if it's the same Aderholt. But we won't replace 113's A000055.jpg
with one from 110. So we should download in reverse: 113, 112, 111, 110.
Currently we're matching against legislators-current.yaml
which of course only has the current members. Some members are still there from 110, but some will have left. We could use legislators-historical.yaml
but don't want to mis-resolve to 18th century people. Sorting first in reverse would match with better ones first, but could still mismatch down the long list. The congress session number isn't in the YAML, but perhaps we could resolve against some dates in the terms? Or perhaps resolve other data from member pages against the YAML. I'm not sure of the future benefit of doing this so I think I'll just manually, temporarily crop legislators-historical.yaml so it only goes back to 2007 (and reverse sort the list first).
@hugovk, looking at your commits, it might be worth reporting the mistakes you found (Curzon, at least) to GPO to see if they can fix it.
@GPHemsley, the dir structure is modeled by how Sunlight's managed our photo archive in the past, which included offering zips of individual directories (all 200x250
's for example) and count on each photo being easily tie-able to individual MoCs. So the URLs are more optimized for files than for an ideal web service, but I think that's fine for what this is.
For resizing, we can use imagemagick to ensure a consistent width and height without distorting the aspect ratio (by cropping a bit where needed). That's been what we've done so far and it's never looked bad.
Also: this is coming together SO WELL! :smile_cat:
Looking at the image stats you compiled, @hugovk, it looks like 400x500
might be the best new size to promise? We can promise it for all but 40 of that batch. Then 200x250
covers 39 of those.
It's much easier to ask people to have their browser/app/etc auto-scale an image down than to auto-scale it up. So, proposed sizes:
original
400x500
200x250
100x125
And if anyone wants smaller, they can auto-scale (or batch process them on their own). Any objections? /cc @dcloud @drinks @jcarbaugh
Given that the majority of the images are 675x825
, or an aspect ratio of 9:11, I would think you'd promise a size that was actually in that aspect ratio.
This would mean something like 400x489
or 500x409
or, most likely, 450x550
.
And, actually, a lot of the other images are also in a 9:11 ratio with smaller sizes.
So I'd propose these sizes:
original
450x550
225x275
90x110
or 99x121
or 108x132
or 117x143
45x55
Good point, we don't have to lose data just to keep with the sizes of yore. I'd propose 100x120
for that lower tier (to keep round-ish numbers), and then not bother distributing 45x55
s.
And those images that aren't in a 9:11 ratio are in a 33:37 ratio (or roughly 3:4), so they would prefer the following sizes:
original
396x444
or 495x555
198x222
99x111
The full data table is here:
1.2222222222 0.8181818182 1.1212121212 0.8918918919
Count Width Height W/H H/W Width 9:11 H Height 9:11 W Width 33:37 H Height 33:37 W
1 504 617 0.82 1.22 504 616 617 505 504 565 617 550
94 452 553 0.82 1.22 452 552 553 452 452 507 553 493
364 675 825 0.82 1.22 675 825 825 675 * 675 757 825 736
1 1009 1233 0.82 1.22 1009 1233 1233 1009 1009 1131 1233 1100
39 230 281 0.82 1.22 230 281 281 230 230 258 281 251
2 589 719 0.82 1.22 589 720 719 588 589 660 719 641
13 675 757 0.89 1.12 675 825 757 619 675 757 757 675
4 495 555 0.89 1.12 495 605 555 454 495 555 555 495 *
1 198 222 0.89 1.12 198 242 222 182 198 222 222 198 *
18 736 825 0.89 1.12 736 900 825 675 736 825 825 736
You'll note that the stars indicate images that are already in an idealized aspect ratio (i.e. one without rounding): in particular, 675x825
(9:11), 495x555
(33:37), and 198x222
(33:37).
I would recommend against attempting to use "roundish" numbers, as that can significantly change the aspect ratio, particularly at smaller sizes. For example, 100x120
is in the 5:6 aspect ratio; the desired 9:11 height for a 100px width is roughly 122px and the desired 9:11 width for a 120px height is roughly 98px.
Damn, thanks for bringing some rigor to it. And you're right that the aspect ratio should stay consistent between all the sizes, and that's it's nice to stay close to the original aspect ratio.
But I do think there's a value in round numbers here, and these aren't photographs of detailed scenery -- cropping out some flag or solid-color background fluff around the rim loses no meaningful information, and imagemagick makes it trivial to do. I'm more concerned about ensuring people make apps see this as trivially easy to integrate.
So altogether I'm still inclined to go back to the original proposal:
original
400x500
200x250
100x125
It's quick to understand, it's easy to fit it into one's existing layout -- in other words, it uses a simpler aspect ratio with more round common denominators people can auto-scale them down to if necessary. That's more important to me than preserving original aspect ratio.
Any competent image resizing application can maintain aspect ratio—there's no need to worry about having more round common denominators, IMO.
Given that the majority of the images are in a single aspect ratio, and the rest are in a fairly similar one, I'd recommend the following sizes:
original
450x550
225x275
90x110
45x55
(if necessary)That way, they're all at least divisible by 5, which makes them round enough. And then you can crop the rest based on their width (removing whatever is left over on the top and bottom).
By which I mean: resize to the highest standard width that is less that the original width, and then crop off the top and bottom to match the standard height ((original height - standard height) / 2).
OK, so, I did more thinking on this.
Since we want relatively "round" numbers, I determined all the possible sizes that we could have, using multiples of 5:
Multiplier Width Height
5 45 55
10 90 110
15 135 165
20 180 220
25 225 275
30 270 330
35 315 385
40 360 440
45 405 495
50 450 550
55 495 605
60 540 660
65 585 715
70 630 770
75 675 825
80 720 880
85 765 935
90 810 990
95 855 1045
100 900 1100
105 945 1155
110 990 1210
Then I determined which of these sizes we can get out of each original image size:
W H original 55 165 220 275 330 385 440 495 550 605 660 715 770 825 880 935 990 1045 1100 1155 1210
1009 1233 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
736 825 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 0 0 0 0 0 0 0
675 825 364 364 364 364 364 364 364 364 364 364 364 364 364 364 364 0 0 0 0 0 0 0
675 757 13 13 13 13 13 13 13 13 13 13 13 13 13 0 0 0 0 0 0 0 0 0
589 719 2 2 2 2 2 2 2 2 2 2 2 2 2 0 0 0 0 0 0 0 0 0
504 617 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0
495 555 4 4 4 4 4 4 4 4 4 4 0 0 0 0 0 0 0 0 0 0 0 0
452 553 94 94 94 94 94 94 94 94 94 94 0 0 0 0 0 0 0 0 0 0 0 0
230 281 39 39 39 39 39 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
198 222 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
537 537 537 537 536 497 497 497 497 497 399 398 398 383 383 1 1 1 1 1 1 1
100.0% 100.0% 100.0% 100.0% 99.8% 92.6% 92.6% 92.6% 92.6% 92.6% 74.3% 74.1% 74.1% 71.3% 71.3% 0.2% 0.2% 0.2% 0.2% 0.2% 0.2% 0.2%
According to these calculations, these are the largest sizes we can use that will cover a certain percentage of images:
original
(100%)675x825
(71.3%)585x715
(74.1%)495x605
(74.3%)450x550
(92.6%)225x275
(99.8%)180x220
(100%)Given that 71.3% of images can be covered by the original size of two thirds of the images, I recommend we offer that size by default. Then, we can offer a bunch of sizes that apply to most of the images.
As such, I propose we offer the following sizes:
original
675x825
450x550
225x275
180x220
In general, they line up with what has already been proposed. But now they're backed by data!
@konklone, I've emailed GPO to ask them to correct Byrne Bradley and David Curzon.
@hugovk, thank you!
Does anyone else think 450x550
and 225x275
are better choices? I'm open to it if I'm the only one who thinks round numbers are important.
The reason I think round numbers are important is partly for automatic downscaling, so that someone can still fit these images into their 150px
-wide space through hotlinking and have the dimensions stay integers, without having to run the images through a batch process and host them everywhere. For example, in our Congress app for Android, we take the 200x250
version and stretch it up or down to fit the density of the screen. And then partly it's cognitive aesthetics - people can more quickly grasp what we offer, how wide things are, and don't have to think hard about it.
I hate taking so much time to talk about this! But since we're generating permalinks, it's hard to take them back. Anyone have an opinion to help settle this?
Either way, I think we shouldn't promise a size that only ~70% of the photos have. ~90% is more reasonable, so I'd put 450x500
as the upper limit. And in your list, the 180
-width one is close enough to the 225
-width one, I don't think it's worthwhile.
OK, so someone, anyone, ring in with a preference for:
original
, 400x500
, and 200x250
- ororiginal
, 450x550
, and 225x275
So I'm already rescaling myself from the original, and I've totally lost track of the data here.... Which of the pairs (400x500 vs 450x500; 200x250 vs 225x275) is the closest to the most common aspect ratio? (i.e. minimize data loss during resizing)
The bigger ones (450x550
and 225x275
) minimize data loss and are closest to the most common aspect ratio.
So I'd say go with that, but take that as only a +0.1 since I probably won't be using the scaled versions in the near future anyway. (Also, resolutions keep getting higher so 100px is probably not going to be very useful for much longer.)
All right, then barring objection let's go with the bigger sizes.
@konklone You don't seem to think that ~70% is a high number, and I'm not sure why. It seems to be the standard size going forward, and it's already the same as the "original" size for most of the ones that are currently available. If we restrict our sizes to the least common denominator, we're going to lose a lot of information for no reason, I think.
But perhaps I didn't mention that clearly before: I chose 675x825
specifically because it's already the original
size for a good majority of the images, and it also captures an intermediate size for the handful of images whose original
size is larger than that. It seems to me that newer images will be posted, at a minimum, in this size (though I'm just guessing based on the number of images that already have this size).
@konklone @JoshData Also, the larger sizes (450x550
and 225x275
) are not just close to being in the most common aspect ratio, they are in the most common aspect ratio. And I don't think it's too much to ask for developers to deal with numbers divisible by 5—it's not just any old arbitrary number. In fact, aside from the smaller one which you've rejected because of its closeness to the another (though I included it because it's the largest size that fits all images), all of the sizes in question are also divisible by 25, which is still super round.
Okay.... I think everyone's happy with the 450 and 225 sizes, and since no one actually is coming with a use case/need for 675x825 I don't think there's any reason to keep talking about it @GPHemsley.
There may be a use case for a small, thumbnail and possibly a square avatar-size, but we can add smaller sizes as and when a need arises.
Definitely. And thank you once again for the follow-through, @hugovk, I see the images and docs have all been updated. :)
http://theunitedstates.io/images/congress/450x550/L000551.jpg
I'm ready to start pointing people here and telling them to use the URLs.
The
congress
directory is empty!