google / guetzli

Perceptual JPEG encoder
Apache License 2.0
12.92k stars 976 forks source link

Save when compressed on image itself #141

Open fzoccara opened 7 years ago

fzoccara commented 7 years ago

Hello, I'd like to know when an image as been alredy been compressed, in order to avoid to re-process it.

To do that I've go with exif image datas by saving the information on "User Comment" exif data like this: exiftool -UserComment="compressed with guetzli" image_test.jpg And read the value with this command: exiftool image_test.jpg | grep "User Comment"

Is there a way for let the guetzli do this kind of add by itself? Or maybe a better way to do this.. :)

Best. Francesco.

mgrhm commented 7 years ago

This would be extremely useful, especially if it also tagged with a guetzli version number (#130).

gerfra commented 7 years ago

Is there a way for let the guetzli do this kind of add by itself? NO.... but I think I could make an automatic procedure to restore your data EXIF, even recursively, see here https://www.patreon.com/guetzli

khavishbhundoo commented 7 years ago

I may have a better way to do what you want.

Compute the sha256 checksum of the image after guetzli compression and store it in your Database. Then to check if an image has already been optimized just compute the checksum and check if it exists in the database.

This approach is better in the following ways:

1.This technique is independent of the tool use be it be guetzli or any other image optimization software 2.Storing EXIF in image is a bad idea as it increase the file size of the image

fzoccara commented 7 years ago

Hi @khavishbhundoo it is a good idea, I just want to avoid to use any external structure. That's why I've done it with php using EXIF data. Here we are with my (not as much tested) code.

In this script I take a bunch of images by filtering them on the EXIF datas, and then parse all of them. I've provide a max images count just in case of a large set of images to parse.

<?php

$jpgPathToWatchIn = 'PATH_TO_IMAGES';
$imageCountToParseMax = "1000";
$logEnabled = true;

$guezliCommand = "PATH_TO_GUEZLI_FOLDER/bin/Release/guetzli";
$guezliOptions = " --quality 95  --verbose ";
$compressionPathCommandOptions = $guezliCommand . $guezliOptions;

$logFilePath = 'PATH_TO_LOGS';
$noLoggingCmds = "/dev/null 2>&1 &";

// get all images (recursively)
log("Listing images", null, 'compress.log');
$images = _listImages($jpgPathToWatchIn);

// get all possible images on that folder recursively with their absolute path
$i = $j = 0;
log("Parsing images", null, 'compress.log');
foreach ($images as $image) {

    $originalImageComment = '';

    // if image alredy been compressed then skip that image
    $image = str_replace(array('(',')'), array('\(','\)'), $image);
    $cmd = 'exiftool -comment ' . $image;// . ' > '.$logFilePath;
    $originalImageComment = shell_exec($cmd);

    // skip if not jpg or should process all and already parsed image
    if(strpos($originalImageComment, 'guetzli') !== false ){
        continue;
    }
    if($originalImageComment !== null){
        $originalImageComment = ' '.$originalImageComment;
    }

    $unParsedYetImages[] = array('path' => $image, 'filesize'=>filesize($image), 'timestamp' => filemtime($image), 'orignal_comment' => $originalImageComment);
}

log("Images not compressed yet: ".count($unParsedYetImages), null, 'compress.log');
log("Images max to compress this turn: ".$imageCountToParseMax, null, 'compress.log');

$originalImageSizes = 0;
$processedImageSizes = 0;
$i = $j = 0;

log("Compress images", null, 'compress.log');
foreach ($unParsedYetImages as $unParsedYetImage) {
    if( $i++ > $imageCountToParseMax ){
        break;
    }
    if(!file_exists($unParsedYetImage['path']) ){
        log("Image not exist: ". $unParsedYetImage['path'], null, 'compress.log');
        continue;
    }

    log("Compressing image: ". $unParsedYetImage['path'], null, 'compress.log');

    $originalImageSize = $unParsedYetImage['filesize'];
    $originalImageSizes += $originalImageSize;

    // process image
    $cmd = $compressionPathCommandOptions . ' ' . $unParsedYetImage['path'] . ' ' . $unParsedYetImage['path'];// . " > " . $logFilePath;
    $compressedResult = shell_exec($cmd);
    log($compressedResult, null, 'compress.log');

    // save image as processed by saving in image exif data the compressed tag value
    $cmd = 'exiftool -comment="guetzli'. $unParsedYetImage['orignal_comment']. '" ' . $unParsedYetImage['path'];// . " > " . $logFilePath;
    $exitSavedResult = shell_exec($cmd);
    log($exitSavedResult, null, 'compress.log');

    // check if the exif data is correctly writed
    $cmd = 'exiftool -comment ' . $unParsedYetImage['path'];// . ' > '.$logFilePath;
    $parsedImageComment = shell_exec($cmd);
    if(strpos($parsedImageComment, 'guetzli') !== false && file_exists($unParsedYetImage['path'].'_original')){
        $cmd = 'rm ' . $unParsedYetImage['path'].'_original';// . ' > '.$logFilePath;
        $copyImageRemoved = shell_exec($cmd);
        log($copyImageRemoved, null, 'compress.log');
    }

    $processedImageSize = filesize($unParsedYetImage['path']);
    $processedImageSizes += $processedImageSize;
    $parsedImages[] = $unParsedYetImage;
}

log("Images not compressed yet: ".count($unParsedYetImages), null, 'compress.log');
log("Original total images size: " . $originalImageSizes, null, 'compress.log');
log("Compressed total images size: " . $processedImageSizes , null, 'compress.log');
log("Space saves: " . ($originalImageSizes - $processedImageSizes) , null, 'compress.log');

function _listImages($dir) {
    $files = scandir($dir);
    $list = array();
    $j=0;
    foreach ($files as $file) {
        if ($file != '.' && $file != '..') {
            if (strlen($file) >= 5 && pathinfo($file, PATHINFO_EXTENSION) == 'jpg') {
                $list[] = realpath($dir.'/'.$file);
            }
            if (is_dir($dir . '/' . $file)) {

                $list = array_unique(array_merge($list, _listImages($dir . '/' . $file)));
            }
        }
    }
    return $list;
}

function log($message, $type, $logFile){
    // whatever
}
?>

Plese feel free to comment and share ideas.

Best. Francesco.

khavishbhundoo commented 7 years ago

Since you don't want to use external structures like a database, adding the info as exif data is the only option in my book.However i do hope that you are aware that guetzli use a HUGE amount of RAM and CPU and consequently i think that doing batch processing with guetzli isn't a good idea at this point.

https://github.com/google/guetzli#using

khavishbhundoo commented 7 years ago

I suggest you to add a check to ensure that you have required RAM To calculate MPix of an image use the formula below number_of_MPix = (width * height)/100000 You should provide 300MB of memory per 1MPix of the input image