xperseguers / t3ext-extractor

TYPO3 Extension extractor
https://extensions.typo3.org/extension/extractor
GNU General Public License v2.0
14 stars 23 forks source link

Metadata db UPDATE might fail if title too long #74

Closed sypets closed 4 months ago

sypets commented 7 months ago

The database field in sys_file_metadata.title is tinytext which should be 255 bytes (not chars).

| title | tinytext | YES | | NULL | |

Ideally it should be checked if title fits before updating.

Log message

Thu, 07 Dec 2023 13:00:30 +0100 [ERROR] request="840a32b4ab1ff" component="TYPO3.CMS.Scheduler.Task.AbstractTask": 
A Task Exception was captured.- 
DriverException: An exception occurred while executing 
'UPDATE `sys_file_metadata` SET `pid` = ?, `tstamp` = ?, `crdate` = ?, `cruser_id` = ?, `sys_language_uid` = ?, `l10n_parent` = ?, `l10n_diffsource` = ?, `t3ver_oid` = ?, `t3ver_wsid` = ?, `t3ver_state` = ?, `t3ver_stage` = ?, `t3ver_count` = ?, `t3ver_tstamp` = ?, `t3ver_move_id` = ?, `t3_origuid` = ?, `file` = ?, `title` = ?, `width` = ?, `height` = ?, `description` = ?, `alternative` = ?, `categories` = ?, `visible` = ?, `status` = ?, `keywords` = ?, `caption` = ?, `creator_tool` = ?, `download_name` = ?, `creator` = ?, `publisher` = ?, `source` = ?, `location_country` = ?, `location_region` = ?, `location_city` = ?, `latitude` = ?, `longitude` = ?, `ranking` = ?, `content_creation_date` = ?, `content_modification_date` = ?, `note` = ?, `unit` = ?, `duration` = ?, `color_space` = ?, `pages` = ?, `language` = ?, `fe_groups` = ?, `copyright` = ?, `l10n_state` = ?, `camera_make` = ?, `camera_model` = ?, `camera_lens` = ?, `shutter_speed` = ?, `focal_length` = ?, `exposure_bias` = ?, `white_balance_mode` = ?, `iso_speed` = ?, `aperture` = ?, `flash` = ?, `altitude` = ? WHERE `uid` = ?' with params [0, 1701950430, 1450281649, 1080, 0, 0, "", 0, 0, 0, 0, 0, 0, 0, 0, 128900, "Die Gesundheit der B\u00fcrgerinnen und B\u00fcrger wird nach dem Programm der Weltgesundheitsorganisation (World Health Organization-WHO) \u201eGesundheit f\u00fcr alle bis zum Jahre 2000\" aus dem Jahre 1977 als ein Zustand des v\u00f6lligen k\u00f6rperlichen, geistigen und sozialen ", 0, 0, null, null, 0, 1, "", "", "", "Microsoft Word 9.0", "", "", "", "Universit\u00e4t Oldenburg", "", "", "", "0.00000000000000", "0.00000000000000", 0, 1077624120, 1077624120, "", "", 0, "", 0, "", "", null, null, "", "", "", "", 0, "", "", 0, 0, 0, 0, 168671]:

Data too long for column 'title' at row 1, in file /var/www/www.uni-oldenburg.de/releases/156/vendor/doctrine/dbal/lib/Doctrine/DBAL/Driver/AbstractMySQLDriver.php:128 

Stack trace:

#0 /var/www/wmysite/releases/156/vendor/doctrine/dbal/lib/Doctrine/DBAL/Driver/Mysqli/MysqliStatement.php(164): mysqli_stmt->execute()
#1 /var/www/wmysite/releases/156/vendor/doctrine/dbal/lib/Doctrine/DBAL/Connection.php(1527): Doctrine\\DBAL\\Driver\\Mysqli\\MysqliStatement->execute()
#2 /var/www/wmysite/releases/156/vendor/doctrine/dbal/lib/Doctrine/DBAL/Connection.php(894): Doctrine\\DBAL\\Connection->executeStatement()
#3 /var/www/wmysite/releases/156/htdocs/typo3/sysext/core/Classes/Database/Connection.php(310): Doctrine\\DBAL\\Connection->update()
#4 /var/www/wmysite/releases/156/htdocs/typo3/sysext/core/Classes/Resource/Index/MetaDataRepository.php(194): TYPO3\\CMS\\Core\\Database\\Connection->update()
#5 /var/www/wmysite/releases/156/htdocs/typo3/sysext/core/Classes/Resource/MetaDataAspect.php(194): TYPO3\\CMS\\Core\\Resource\\Index\\MetaDataRepository->update()
#6 /var/www/wmysite/releases/156/htdocs/typo3/sysext/core/Classes/Resource/Index/Indexer.php(166): TYPO3\\CMS\\Core\\Resource\\MetaDataAspect->save()
#7 /var/www/wmysite/releases/156/htdocs/typo3/sysext/core/Classes/Resource/Index/Indexer.php(142): TYPO3\\CMS\\Core\\Resource\\Index\\Indexer->extractMetaData()
#8 /var/www/wmysite/releases/156/htdocs/typo3/sysext/scheduler/Classes/Task/FileStorageExtractionTask.php(60): TYPO3\\CMS\\Core\\Resource\\Index\\Indexer->runMetaDataExtraction()
#9 /var/www/wmysite/releases/156/htdocs/typo3/sysext/scheduler/Classes/Scheduler.php(192): TYPO3\\CMS\\Scheduler\\Task\\FileStorageExtractionTask->execute()
#10 /var/www/wmysite/releases/156/htdocs/typo3/sysext/scheduler/Classes/Controller/SchedulerModuleController.php(795): TYPO3\\CMS\\Scheduler\\Scheduler->executeTask()
#11 /var/www/wmysite/releases/156/htdocs/typo3/sysext/scheduler/Classes/Controller/SchedulerModuleController.php(282): TYPO3\\CMS\\Scheduler\\Controller\\SchedulerModuleController->executeTasks()
#12 /var/www/wmysite/releases/156/htdocs/typo3/sysext/scheduler/Classes/Controller/SchedulerModuleController.php(175): TYPO3\\CMS\\Scheduler\\Controller\\SchedulerModuleController->getModuleContent()
#13 /var/www/wmysite/releases/156/htdocs/typo3/sysext/backend/Classes/Http/RouteDispatcher.php(91): TYPO3\\CMS\\Scheduler\\Controller\\SchedulerModuleController->mainAction()

version

xperseguers commented 7 months ago

Indeed, but having the title

"Die Gesundheit der B\u00fcrgerinnen und B\u00fcrger wird nach dem Programm der Weltgesundheitsorganisation (World Health Organization-WHO) \u201eGesundheit f\u00fcr alle bis zum Jahre 2000\" aus dem Jahre 1977 als ein Zustand des v\u00f6lligen k\u00f6rperlichen, geistigen und sozialen"

I've the feeling your editors have a strange notion of what means a title and what should rather go to the description :-)

artus70 commented 7 months ago

I've the feeling your editors have a strange notion of what means a title and what should rather go to the description :-)

True! 😄

sypets commented 7 months ago

Yes, true, this should not be, really.

The problem is, the metadata extraction task aborts now because of a very few (probably old files). Only solution I see is manually sifting through the logs and handling the files individually, e.g. changing the title. (this can be done for example with exiftool).

I understand it probably should not be responsibility of this extension to make sure metadata fits in the DB. However, TYPO3 also does not make sure, so this leads to aborted scheduler task.

sypets commented 7 months ago

closing. Feel free to reopen if this should be handled.

xperseguers commented 7 months ago

It’s ok to keep it open, as this is a possible problem and ideally it should get solved. But just hardcoding a substr($title, 0, 255) or alike is not really good as, even if questionable, we cannot rule out someone enlarges this field with a custom extension. So the perfect solution would be to check actual size for each field.

xperseguers commented 4 months ago

Rereading this, I have the feeling TYPO3 Core is (a bit) faulty here. Of course this is not expected to happen if you edit the metadata with the Backend edit form, but if an extension (like EXT:extractor) provides some metadata to the Core using the official API, it shouldn't be its responsibility to understand the underlying DB schema and "cut" a value based on the schema, instead, this is logically the responsibility of the Core (TYPO3\CMS\Core\Resource\Index\MetaDataRepository) to know its own schema and cut content accordingly, or at least handle the "data too long" warning (logically this shortens automatically and isn't an "exception" per se).

Do you agree?

sypets commented 4 months ago

That makes sense what you wrote.

It would be nice if this did not result in an exception - but I understand that this is something which should rather be handled in TYPO3 where the DB schema is known.

From my end, all files which caused problems have been fixed manually and everything was extracted. If there is a rare case of a file causing problems in the future, I think we can live with that.

Thanks for your great work!