Improved extraction of PDF subset info for PDF/UA, PDF/VT, and PDF/X.
NOTE: we no longer append PDF/A information, e.g. 'version="A-1b"'
to the 'dc:format'. Users must now get that information from the
'pdfa:PDFVersion' key or from 'pdfaid:conformance'
and 'pdfaid:part' (TIKA-3844).
Avoid infinite loop in bookmark extraction from PDFs (TIKA-3832).
Upgraded to slf4j 2.0.1 (TIKA-3842).
Added upsert option for the OpenSearch emitter (TIKA-3855).
Extract PDF signature information at the document level
into the metadata (TIKA-3852).
Enable configuration of digests via AutoDetectParserConfig (TIKA-3853).
Use commons-io byte array streams via PJ Fanning (TIKA-3843).
Upgrade to PDFBox 2.0.27 (TIKA-3866).
Upgrade to JempBox 1.8.17 (TIKA-3856).
Add extraction of ODF version from ODF files (TIKA-3840).
tika-parser-html-commons (BoilerPipeHandler) is no longer a
a dependency of tika-parser-html-module. tika-app and tika-server-standard
have added a dependency on tika-parser-html-commons. However,
users who are managing custom dependencies and who want the BoilerPipeHandler
will have to now include the tika-parser-html-commons dependency
(TIKA-1484).
Add unrar as an optional parser (TIKA-3800).
Refactor FuzzingCLI to use PipesParser (TIKA-3799).
ServiceLoader's loadServiceProviders() now guarantees
unique classes (TIKA-3797).
Fix bug that prevented setting of includeHeadersAndFooters
for xls, xlsx, doc and docx via tika-config (TIKA-3796).
Fix bug that prevented specification of rendered image type
via http header in the PDFParser (TIKA-3794).
Fix bug causing some Exif dates to be decoded wrongly on
timezones different than UTC (TIKA-3815).
Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.
Dependabot commands and options
You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
- `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
Bumps tika-core from 1.24.1 to 2.5.0.
Changelog
Sourced from tika-core's changelog.
... (truncated)
Commits
1f4169b
[maven-release-plugin] prepare release 2.5.0-rc115aa09f
Fix rat-check problems5f43255
prep for 2.5.0 rc1cf5896c
Merge pull request #722 from apache/dependabot/maven/com.google.protobuf-prot...1ddc907
Merge pull request #721 from apache/dependabot/maven/aws.version-1.12.313c6e7c8e
Merge pull request #720 from apache/dependabot/maven/test.containers.version-...d3279e9
Bump protobuf-java from 3.21.6 to 3.21.7488c885
Bump aws.version from 1.12.312 to 1.12.313cfa3acf
Bump test.containers.version from 1.17.3 to 1.17.487730cb
TIKA-3864 -- disable unit tests because Apache's Hudson doesn't like utf-8 in...Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting
@dependabot rebase
.Dependabot commands and options
You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)