xperseguers / t3ext-extractor

TYPO3 Extension extractor
https://extensions.typo3.org/extension/extractor
GNU General Public License v2.0
14 stars 23 forks source link

Remove control characters #44

Open seirerman opened 2 years ago

seirerman commented 2 years ago

I have some files that have control characters (eg. BEL, VT, NAK, DC3...) in the metadata (see attachment: 70-2021-290-5-23-12-2021.pdf). See also https://en.wikipedia.org/wiki/ASCII#Control_code_chart for a list of control characters.

The extractor extension doesn't remove those special characters when a new file is imported into TYPO3. This leads to solr getting stuck while indexing the affected files, which stops all following files in the queue from indexing.