xperseguers / t3ext-extractor

TYPO3 Extension extractor
https://extensions.typo3.org/extension/extractor
GNU General Public License v2.0
14 stars 23 forks source link

Exception because of calling DateTime::timestamp with array #75

Open sypets opened 7 months ago

sypets commented 7 months ago

timestamp() expects a string or null, but may be called with array by AbstractExtractionService:

Impact:

Reason of problem

entry is returned as array (because CreationDate in the metadata in the file (as returned by exiftool) contains date and time separated by comma, e.g."7/18/01, 12:51 PM") and exiftool -j and subsequent json_decode will create an array for this entry, not a string. (see next comment)

Source Code

DateTime::timestamp:

public static function timestamp(?string $str = null): ?int

AbstractExtractionService:

if (isset($processor)) {
                    if (preg_match('/^([^(]+)(\((.*)\))?$/', $processor, $matches)) {
                        $processor = $matches[1];
                        $parameters = [$value];
                        if (isset($matches[3])) {
                            if (substr($matches[3], 0, 1) === '\'') {
                                $parameters[] = substr($matches[3], 1, -1);
                            } else {
                                $fields = GeneralUtility::trimExplode(',', $matches[3]);
                                foreach ($fields as $field) {
                                    if (array_key_exists($field, $parentValue)) {
                                        $parameters[] = $parentValue[$field];
                                    }
                                }
                            }
                        }
                        try {
                            $value = call_user_func_array($processor, $parameters);

Log

Fri, 08 Dec 2023 09:10:08 +0100 [CRITICAL] request="bbfce934872fc" component="TYPO3.CMS.Core.Error.ProductionExceptionHandler": Core: Exception handler (CLI: BE): TypeError, code #0, 
file /var/www/mysite/public/typo3conf/ext/extractor/Classes/Utility/DateTime.php
, line 31: 
Causal\Extractor\Utility\DateTime::timestamp(): 
  Argument #1 ($str) must be of type ?string, array given- TypeError: Causal\Extractor\Utility\DateTime::timestamp(): 
....

Stack trace:

#0 [internal function]: Causal\\Extractor\\Utility\\DateTime::timestamp()
#1 /var/www/mysite/public/typo3conf/ext/extractor/Classes/Service/Extraction/AbstractExtractionService.php(363): call_user_func_array()
#2 /var/www/mysite/public/typo3conf/ext/extractor/Classes/Service/Extraction/ExifToolMetadataExtraction.php(85): Causal\\Extractor\\Service\\Extraction\\AbstractExtractionService->remapServiceOutput()
#3 /var/www/mysite/public/typo3/sysext/core/Classes/Resource/Service/ExtractorService.php(47): Causal\\Extractor\\Service\\Extraction\\ExifToolMetadataExtraction->extractMetaData()
#4 /var/www/mysite/public/typo3/sysext/core/Classes/Resource/Index/Indexer.php(164): TYPO3\\CMS\\Core\\Resource\\Service\\ExtractorService->extractMetaData()
#5 /var/www/mysite/public/typo3/sysext/core/Classes/Resource/Index/Indexer.php(142): TYPO3\\CMS\\Core\\Resource\\Index\\Indexer->extractMetaData()
...
sypets commented 7 months ago

followup, I investigated some more.

Apparently, in the affected file, the CreateDate looks like this:

$ exiftool <filename> | grep Create

output:

Create Date                     : 7/18/01, 12:51 PM

We have date and time separated by comma.

In the extension ExifToolService::extractMetadataFromLocalFile exiftool is called with -j which will create json output and this creates an array for this CreateDate:

$ /usr/bin/exiftool -j <filename>

output: (json)

...
"CreateDate": ["7/18/01","12:51 PM"],
...

I confirmed by debugging that the created metadata in extractMetadataFromLocalFile contains several arrays (for "CreateDate", "For") and otherwise contains strings, int and float.

So, we should either expect various data types (not just string) or make sure the datatypes are always string.

xperseguers commented 4 months ago

Could you test this patch and see if that helps?

diff --git a/Classes/Utility/DateTime.php b/Classes/Utility/DateTime.php
index dd92997..4b27297 100644
--- a/Classes/Utility/DateTime.php
+++ b/Classes/Utility/DateTime.php
@@ -25,12 +25,15 @@ class DateTime
     /**
      * Converts a date/time into its Unix timestamp.
      *
-     * @param string|null $str
+     * @param string|array|null $str
      * @return int|null
      */
-    public static function timestamp(?string $str = null): ?int
+    public static function timestamp($str = null): ?int
     {
-        if ($str === null) {
+        if (is_array($str)) {
+            $str = implode(' ', $str);
+        }
+        if (!is_string($str)) {
             return null;
         }
         if (preg_match('/^\d{4}:\d{2}:\d{2} \d{2}:\d{2}:\d{2}$/', $str)) {
sypets commented 4 months ago

I don't have the file anymore. But looking at the code, it should fix it.

Also, newer versions of tools should probably not produce this kind of output.

For example, I tested with Gimp and this was written:

Time Created                    : 15:51:18-15:51
Date Created                    : 2024:02:15
Date/Time Created               : 2024:02:15 15:51:18-15:51

When trying to write a date with comma, my latest exiftool will not allow it:

exiftool "-CreateDate=7/18/01, 12:51 PM" fasseing_CreateDateWithComma.png 
Warning: Invalid date/time (use YYYY:mm:dd HH:MM:SS[.ss][+/-HH:MM|Z]) in ExifIFD:CreateDate (PrintConvInv)
Nothing to do.

As mentioned before, some of the files which were extracted were pretty old.