nawarian / The-PHP-Website

Yet another php website
https://thephp.website
33 stars 12 forks source link

Will PHP 8 finally fix the EXIF bugs? #32

Open rmatzinger opened 4 years ago

rmatzinger commented 4 years ago

Ever since PHP 7, some of the EXIF functions for image meta-data in PHP have failed. Functions like exif_read_data() and exif_thumbnail() that work correctly in PHP 5.x, have stopped working properly. exif_thumbnail() fails completely and exif_read_data() fails to read all the EXIF headers from images. The bug in exif_read_data() prevents extraction of GPS data from image meta-data.

People propose a work-around -- executing external programs like exiftool or imagemagick, but those "solutions" are much slower and make it painful to present a page of thumbnails or use GPS data, for instance placing pushpins in a Google Maps map for a set of photographs.

They have fixed some of the bugs but all versions of PHP 7 still have the above problems. Since PHP 7.4 is the end of the line until version 8, we need to know if they will finally fix these problems. Some of us have had to stay with PHP 5.6 because we can't move to 7.x.

nawarian commented 4 years ago

Sorry about my delay; I had crazy weeks since May (I believe we all had) :((

Anyhow, I find it really interesting the exif_*() situation. As it is not part of my daily work I had no clue such issues appeared from php 5.x to 7.x.

I quickly fetched this list of bug tickets related to EXIF on php.net website: https://bugs.php.net/search.php?cmd=display&direction=DESC&limit=30&package_name[]=EXIF+related

Is this list representative or are there more bugs you don't see in this list?

Short answer if they are fixed in php 8: doesn't seem like so. If you check the NEWS file in php's master branch, there's nothing about EXIF unfortunately: https://github.com/php/php-src/blob/master/NEWS.

I'll be pleased to write about this, of course. But I'd need some more information, do you think you can support me with it by providing more information? :)

rmatzinger commented 4 years ago

I stopped paying attention to the many reports of EXIF problems with PHP 7.x because it never seemed to get fixed. I tried contacting the PHP team member that was supposedly responsible for fixing these things but never got a response. All I know is that it fails on every version of PHP 7 I have tried, including 7.1, 7.2, 7.3, and 7.4, the latest being 7.4.7. PHP function exif_thumbnail() fails on all of them and having to exec exiftool for each thumbnail is just much too slow. So I am stuck using PHP 5.6.40.

I have a system that receives and processes thousands upon thousands of GPS encoded JPG image files from many different cameras. I need to display many thumbnails, dozens and dozens per web page, AND extract the GPS data from each image to create a Google Maps link. With PHP 5.6.40 a page of thumbnails is displayed very quickly however when PHP 7.x is used the same code can't display thumbnails or extract GPS data. I have to use exiftool for both. When exif_thumbnail() works, it is very fast in part because you can send the image data directly to the browser without writing it to a file. When it doesn't work it means an exec() of exiftool which has to write the thumbnail to a file, and then the browser has to request that file to display it.
Extracting one thumbnail with exiftool is fine but when you have to create a page of thumbnails executing exiftool for each one really slows things down.

Same with extracting GPS data which requires, another exec() of exiftool and capture and parse the result, instead of a fast call to PHP's exif_read_data() function.

Here is a simple thumbnail extraction test (note this github text editor seems to interpret HTML line breaks (BR) even when using the Insert-Code option) :

`$fn = "P1010015.JPG"; echo("PHP Version: ".phpversion()."

"); if (!file_exists($fn)) { die("File: $fn NOT FOUND"); } echo("Calling PHP function: exif_thumbnail('$fn', width, height, type)
"); $image = @exif_thumbnail($fn, $width, $height, $type); if ($image !== false) { echo("php exif_thumbnail() successful

"); echo(""); } else { echo("
php exif_thumbnail() FAILED

"); }

// use exiftool to extract the thumbnail $thumbfilename = "thumb.jpg"; $exec = "exiftool -b -ThumbnailImage $fn > $thumbfilename"; echo("


Calling exiftool: $exec
"); $img = exec($exec); if (file_exists($thumbfilename)) { echo("Thumbnail extracted by exiftool as $thumbfilename
"); echo("
"); } else { echo("exiftool failed to extract thumbnail
"); } `

rmatzinger commented 4 years ago
My email program (Thunderbird) couldn't handle the Reply-To address
in your message.  So I had to copy and paste it in.  Very confusing
about how to reply to your message.

I replied on GitHub (please see https://github.com/nawarian/The-PHP-Website/issues/32
) but thought I would send you a more complete test program that 
demonstrates both problems.  You need to be able to run it on PHP
5.6 AND on any  7.x  version to see the difference.  In order to
demonstrate how exiftool CAN read the image files that PHP cannot,
you would need to have exiftool installed. 

Note that PHP 7.x can still retrieve some EXIF data from files but
stumbles on something and stops reading and doesn't return EXIF
fields beyond some point and thus doesn't include the GPS fields.  
Calling exif_thumbnail() or exif_read_data() without the @ prefix
allows you to see the error message about what it stumbles on. 

I have attached my PHP test code as:  testexifthumb.php

Let me know if you have any questions.

--Richard

On 6/28/2020 6:42 AM, Níckolas Daniel
  da Silva wrote:

  Sorry about my delay; I had crazy weeks since May (I believe we
    all had) :((
  Anyhow, I find it really interesting the exif_*()
    situation. As it is not part of my daily work I had no clue such
    issues appeared from php 5.x to 7.x.
  I quickly fetched this list of bug tickets related to EXIF on
    php.net website: https://bugs.php.net/search.php?cmd=display&direction=DESC&limit=30&package_name[]=EXIF+related
  Is this list representative or are there more bugs you don't
    see in this list?
  I'll be pleased to write about this, of course. But I'd need
    some more information, do you think you can support me with it
    by providing more information? :)
  —
    You are receiving this because you authored the thread.
    Reply to this email directly, view it on GitHub, or unsubscribe.
  [

{ "@context": "http://schema.org", "@type": "EmailMessage", "potentialAction": { "@type": "ViewAction", "target": "https://github.com/nawarian/The-PHP-Website/issues/32#issuecomment-650760831", "url": "https://github.com/nawarian/The-PHP-Website/issues/32#issuecomment-650760831", "name": "View Issue" }, "description": "View this Issue on GitHub", "publisher": { "@type": "Organization", "name": "GitHub", "url": "https://github.com" } } ]

"; } for ($i=2; $i < count($exifdata); $i++) { $location .= $exifdata[$i].""; } echo("GPS Location via exiftool: $location "); }

function getimagemagickexif($filename) // uses imagemagick identify program to extract EXIF data { global $path; //, $tmppath; $exec_output = array(); $exec = "/usr/bin/identify -format '%[EXIF:*]' '$filename' "; //echo($exec); echo("ImageMagick Identify EXIF output:"); exec($exec, $exec_output, $exec_retval); $count = count($exec_output); for ($i=0; $i< $count; $i++) { echo("$exec_output[$i] "); } }

function getfullexif($filename) // calls both PHP and Exiftool so you can compare the difference { global $exiftool; $f = $filename; if (!file_exists($f)) { echo("$f not found."); exit; }

// display output from exiftool: $exec = "$exiftool $f"; $result = exec($exec, $exec_output, $exec_retval); if ($result > 0) { echo("ExifTool Result: $result, $exec_retval"); } echo("All EXIF meta data as reported by exiftool:"); foreach ($exec_output as $key => $section) { if ($key == 0) { echo("$section"); } else { echo("$section "); } }

// display exif from PHP: ini_set('exif.encode_unicode', 'UTF-8'); $exif = @exif_read_data($f,"IFDO, EXIF",true); // 'IFDO, EXIF, COMMENT, THUMBNAIL, GPS, ANY_TAG' if ($exif===false) { echo("File empty or missing EXIF data."); } echo("All EXIF metadata as reported by PHP exif_read_data(), broken in PHP 7.x no GPS displayed [this is PHP version: ".phpversion()."]):"); foreach ($exif as $key => $section) { foreach ($section as $name => $val) { $s = ""; if (strtoupper($key) != "MAKERNOTE") // PHP can't interpret makernote values { if (is_array($val)) { if ($key == 'GPS') // some numeric values are expressed as expressions: 123456/100 { $s = print_r($val,true)." "; $elements = count($val); for ($i=0; $i < $elements; $i++) { $values = explode('/',$val[$i]); if ($values[1] > 0) { $s .= "= ".$values[0] / $values[1] . ", "; } } $s = substr($s,0,-2); } else { $s = print_r($val,true); } echo("$key.$name: $s\n"); } else // not an array { if ($key == 'GPS') // some numeric values are expressed as expressions: 123456/100 { if (strpos($val,'/') > 0) // evaluate numeric expression { $s = "$val "; $values = explode('/',$val); if ($values[1] > 0) { $s .= "= ".$values[0] / $values[1] . " "; } } else { $s = "$val "; } } else // not gps { $s = $val; } echo("$key.$name: $s\n"); } } } } } // getfullexif()

echo("PHP Version: ".phpversion()."");

if (!file_exists($fn)) { die("File: $fn NOT FOUND"); }

echo("Calling PHP function: exif_thumbnail('$fn', width, height, type) "); $image = @exif_thumbnail($fn, $width, $height, $type); if ($image !== false) { echo("php exif_thumbnail() extractor successfulthumbnail extracted by PHP exif_thumbnail()"); echo(""); } else { echo("php exif_thumbnail() FAILED"); }

// try using exiftool to extract the thumbnail $thumbfilename = "thumb.jpg"; $exec = "$exiftool -b -ThumbnailImage $fn > $thumbfilename"; echo("Calling exiftool: $exec"); $img = exec($exec); if (file_exists($thumbfilename)) { echo("Thumbnail extracted by exiftool as $thumbfilename"); echo(""); } else { echo("Thumbnail NOT successfully extracted by exiftool"); }

getlatlon($fn); getfullexif($fn); getimagemagickexif($fn); ?>

nawarian commented 4 years ago

Hmm... I tried to reproduce the issue you mentioned but I had no success at all. The exif_read_data() and exif_thumbnail() functions work just fine.

The way I did was by using a Docker built version of php. I've tried with php:7.0-cli and php:7.4-cli and both worked fine. I'm being able to extract metadata from my images.

I used the following images for testing: https://github.com/ianare/exif-samples/tree/master/jpg/gps

The Dockerfile looks like this:

FROM php:7.0-cli

RUN docker-php-ext-install exif

The build command I used:

$ docker build -t php:7.0-cli_exif .

The run command:

$ docker run -it --rm -v "$PWD":/app -w /app php:7.0-cli_exif php exif.php

Could it be that it works fine with Docker images? If so, then you possibly have conflicting C libraries. Wdyt?

Could it be that the images I'm using just work? If so, then your images might have a different encoding that php 7 somehow dropped support to. If that's the case, would it be possible for you to send one sample image as an attachment to nawarian@gmail.com so I can test it locally?

nawarian commented 4 years ago

Ah yes, it can also relate to the SAPI. So maybe it works with php cli but won't work on FPM, for example.

I'll prepare a test tomorrow as it is already too late for me. I strongly recommend you to run this test and let me know which results you got.

Cheers!

nawarian commented 4 years ago

Thanks a lot for your message!

I spotted some differences between php 5.6.40's logic and php 7.x's when it comes to fetch the Image File Directory (IFD). I'll write them down here so you can pick up if you have time before me.

In 5.6 a variable offset_base is used to test the IFD size, and in php 7.x this was refactored to use a struct exif_offset_info and the condition is actually hidden inside the function exif_offset_info_contains which does a different check using a start/end range: https://github.com/php/php-src/blob/master/ext/exif/exif.c#L2080-L2091

And if you check the struct itself, you'll see that the variables used for the check may not necessarily use the offset_base value: https://github.com/php/php-src/blob/master/ext/exif/exif.c#L2037.

This might relate to the 7.x upgrades where the authors decided to add support to different digital cameras. So maybe the struct initialisation (of exif_offset_info) was broken by doing so.

I'll try to get some time after work today to play around with the C extension.

Good thing is, php supported versions point support to php 7.3 until December 2020. So if we fix it in 5 months, you may still be able to see it within php 7.3+ versions :))

If you're curious about all those names, I found a good documentation on the binary format here: https://www.media.mit.edu/pia/Research/deepview/exif.html

Cheers!