Closed Torkiliuz closed 8 years ago
In some ways it already does. Most of the tools used by imgult blaze right through files they have already processed/optimized.
Try running it on the same file twice. The second run should be considerably faster.
At least in my testing with a library consisting of thumbnails and posters for 26 TB worth of movies/TV shows, my server has a big problem surviving with all the processes spawning, even when I change the nice level to 19 :wink: That's the reason it would be nice if it created a imgult-processed.txt-file, so that it could diff imgult-files.txt with that and pick up from when the server crashes, Just a thought though :wink:
I like what you're saying. It will be a bit tricky because several tools process the file (not just one), but I may have an idea.
I am curious though, what kind of server and what version of everything are you running?
jpegoptim
mozjpeg
optipng
pngquant
gifsicle
exiv2
svgo
Linux 3.19.0-56-generic #62~14.04.1-Ubuntu x86_64 GNU/Linux
jpegoptim v1.3.0 x86_64-pc-linux-gnu mozjpeg version 3.1 (build 20150904) OptiPNG 0.6.4: Advanced PNG optimizer. pngquant 2.0.1 (September 2013) LCDF Gifsicle 1.78 exiv2 0.23 001700 (64 bit build) svgo 0.6.2
Will you give this a run?
WARNING: THIS VERSION IS UNTESTED, IT MAY EXPLODE.
https://github.com/ryanpcmcquen/image-ultimator/blob/diffProcessedFiles/imgult
It runs and completes, but the second run still seems to run through all of them again
you're doing the grep, but not sending that to anything, maybe that is the problem? I think you need a third file that you write that grep to, something like imgult-notprocessedfiles.txt?
The grep takes a while, but it works :+1: I think an echo with "matching already processed files" or something could be a nice addition, but I'm just really happy you took the time to make this work!
Great idea! Thanks so much for testing and the suggestion. :^)
Would you mind if I mentioned your use case in the README for the 4.0.00
release? 26TB is quite the testing ground.
Also, will you give it one final run?
https://github.com/ryanpcmcquen/image-ultimator/blob/diffProcessedFiles/imgult
I did a little cleanup, with your go-ahead I will release this as the new version. :smiley_cat:
I'd be honored :smile: To clarify, the image-files are far smaller than 26 TB in size, but it's posters, albumart etc. for 26 TB of media. The size of the posters and albumart is roughly 140 GB :wink:
Still that's quite the testing ground. I am surprised imgult
actually parses through all that. And 140GB is a huge amount of images, how many times do you have to run imgult
to successfully complete all that?
Tested this in the same folder as earlier, but noticed this now:
OpenSans-BoldItalic-webfont.svg: The file contains data of an unknown image type
I have svgo installed, and it should work in theory, but I'm not certain, as it is a "font-svg". PNG's go through normal as before though.
I also get this:
spinner.gif: Writing to GIF images is not supported
This is probably just because it's an animated gif?
It seems good to go right now.
I have been running imgult for about 10 times now with the old version, but it was kind of pointless, since it would start at the beginning each time. With this version I should at least be able to get all images optimized after 10 runs or so :smile:
That's amazing! The GIF
message is actually from exiv2
, it doesn't currently support gifs but I bet it will in the future, so I just process them for now (the warning is innocuous).
You may want to bring up that specific file with the svgo people, it would probably be helpful for them, or they may have an idea what is going on. I would like to know as well. :smiley:
Let me know if there is anything else I can do here, and thanks again for the report!
The release is live!
Figured out a faster way to do the comparison, by using comm
.
From my testing it seemed to work, although it's getting hard to tell by now :stuck_out_tongue_winking_eye:
nice -n15 comm -13 --nocheck-order ${IMGULT_PROCESSED_FILES_LIST} ${IMGULT_FILES_LIST} > ${IMGULT_TEMP_FILES_LIST}
How much faster is it?
If it still works as it should, which is now hard to determine, it runs through the list so fast it doesn't even show up in htop
. I might have made a typo, so I'll try to get a better test setup, so that I can be sure that it works. I get this at the end, but it still seems to run through the files:
/usr/local/bin/imgultnew: 0: /usr/local/bin/imgultnew: Cannot fork
Tested it now, it does not work, it even stopped working on the first run :disappointed: I didn't see it work correctly earlier either, it just looked like it worked, really sorry about that. Tested now with the Kodak png images, but they are still at the same size...
The new version works though, correct?
No, it seems to not work, now that I finally got a stable test-folder :cry:
The only output i get now:
File 1/1: ./kodim20.png
File 1/1: ./kodim23.png
File 1/1: ./kodim15.png
File 1/1: ./kodim17.png
File 1/1: ./kodim16.png
File 1/1: ./kodim01.png
File 1/1: ./kodim14.png
File 1/1: ./kodim02.png
File 1/1: ./kodim05.png
File 1/1: ./kodim22.png
File 1/1: ./kodim07.png
File 1/1: ./kodim21.png
File 1/1: ./kodim13.png
File 1/1: ./kodim24.png
File 1/1: ./kodim19.png
File 1/1: ./kodim03.png
File 1/1: ./kodim12.png
File 1/1: ./kodim06.png
File 1/1: ./kodim11.png
File 1/1: ./kodim08.png
File 1/1: ./kodim09.png
File 1/1: ./kodim18.png
File 1/1: ./kodim10.png
File 1/1: ./kodim04.png
******************************************************************************
___ ___ ___ ___ ___
___ /\__\ /\ \ /\__\ /\__\ /\ \
/\ \ /::| | /::\ \ /:/ / /:/ / \:\ \
\:\ \ /:|:| | /:/\:\ \ /:/ / /:/ / \:\ \
/::\__\ /:/|:|__|__ /:/ \:\ \ /:/ / ___ /:/ / /::\ \
__/:/\/__/ /:/ |::::\__\ /:/__/_\:\__\ /:/__/ /\__\ /:/__/ /:/\:\__\
/\/:/ / \/__/--/:/ / \:\ /\ \/__/ \:\ \ /:/ / \:\ \ /:/ \/__/
\::/__/ /:/ / \:\ \:\__\ \:\ /:/ / \:\ \ /:/ /
\:\__\ /:/ / \:\/:/ / \:\/:/ / \:\ \ \/__/
\/__/ /:/ / \::/ / \::/ / \:\__\
\/__/ \/__/ \/__/ \/__/
******************************************************************************
* Execute parametric cleaning sequence: *
removed ‘imgult-files.txt’
* The imgult has completed. Take care. *
******************************************************************************
Would you try the master? I borked something, should work now. Also, if this does work we can test comm
again. :+1:
The hotfix worked, I tried to change grep
to comm
, but that did not work though :smile:
How about this version? (with comm
):
I have tested comm
by creating two textfiles and listed some similar and different elements in them, and comm
shows the result correctly.
But for some reason it doesn't work with this though :disappointed:
I'll do a manual test and write paths in the textfiles, I'm pretty sure that's where the problem occurs
When I do the manual steps it works, so for some reason it stops working when run from a script. Weird :confused:
Weird, it runs, but on the second run it doesn't skip any of the files :confused:
comm: file 1 is not in sorted order
comm: file 2 is not in sorted order
I think there might be a sort-function that needs to run first for it to work
Strange. It skips all the files here. What arguments are you sending to comm
? Keep in mind that on non-Linux systems comm
does not have the --nocheck-order
, which may make using comm
a showstopper.
It's just running comm -13 ${IMGULT_PROCESSED_FILES_LIST} ${IMGULT_FILES_LIST} > ${IMGULT_TEMP_FILES_LIST}
But it ends up with this:
comm: file 1 is not in sorted order
comm: file 2 is not in sorted order
File 1/1: ./kodim15.png
File 1/1: ./kodim11.png
File 1/1: ./kodim10.png
File 1/1: ./kodim13.png
File 1/1: ./kodim05.png
File 1/1: ./kodim19.png
File 1/1: ./kodim14.png
File 1/1: ./kodim07.png
File 1/1: ./kodim08.png
File 1/1: ./kodim04.png
File 1/1: ./kodim12.png
File 1/1: ./kodim20.png
File 1/1: ./kodim06.png
File 1/1: ./kodim09.png
./kodim14.png:
./kodim08.png:
./kodim13.png:
./kodim05.png:
./kodim12.png:
./kodim07.png:
./kodim06.png:
./kodim20.png:
./kodim15.png:
./kodim11.png:
./kodim19.png:
./kodim09.png:
./kodim04.png:
./kodim10.png:
There might be a way around it by running a sort-function, so that would solve --nocheck-order
not being available on other systems. Not sure how much more I can test today, it's getting late, thanks so much for what you've done so far at least :smiley:
By the time we write a sort function, we probably will not save any time over just using the grep
one-liner we have now. If you do find it is still faster, I would be happy to incorporate the change.
I will keep the commTest
branch open for now.
Ubuntu 14.04 set up like the following runs 4.0.01 on the whole drive without making the server crash:
jpegoptim v1.3.0 x86_64-pc-linux-gnu (from normal repositories) mozjpeg version 3.1 (build 20150904) (from here) OptiPNG 0.6.4: Advanced PNG optimizer. (from normal repositories) pngquant 2.3.0 (July 2014) (from here) LCDF Gifsicle 1.78 (from normal repositories) exiv2 0.23 001700 (64 bit build) (from normal repositories) svgo 0.6.2 (sudo npm install -g svgo)
That's amazing!!! Thank you so much for your help. Does that mean we can close this issue?
Yes, the issue can be closed. It might be a good idea to write the versions you need as requirements though :smile: As it is now it's kind of confusing because the default versions in at least Ubuntu are not the best ones :stuck_out_tongue_winking_eye:
That is a good idea! I use Slackware so I have more current versions of all the tools ... luckily when Ubuntu 16.04 gets released people will have much newer versions of everything by default.
Could you make it so that imgult remembers processed files, similarly to how rsync does? Mainly this is useful so that if the computer crashes during processing imgult can skip already processed files.