mediacloud / metadata-lib

How Media Cloud approaches extracting metadata from online news stories
Apache License 2.0
12 stars 5 forks source link

Update trafilatura requirement from ==1.4.* to >=1.4,<1.7 #51

Closed dependabot[bot] closed 1 year ago

dependabot[bot] commented 1 year ago

Updates the requirements on trafilatura to permit the latest version.

Release notes

Sourced from trafilatura's releases.

trafilatura-1.6.0

Extraction:

Command-line interface:

  • more efficient sitemap processing (#326)
  • more efficient downloads (#338)
  • fix for single URL processing (#324) and URL blacklisting (#339)

Navigation

  • additional safety check on domain similarity for feeds and sitemaps
  • new function is_live test() using HTTP HEAD request (#327)
  • code parts supported by new courlan version

Maintenance

  • allow urllib3 version 2.0+
  • minor code simplification and fixes

Full Changelog: https://github.com/adbar/trafilatura/compare/v1.5.0...v1.6.0

Changelog

Sourced from trafilatura's changelog.

1.6.0

Extraction:

Command-line interface:

  • more efficient sitemap processing (#326)
  • more efficient downloads (#338)
  • fix for single URL processing (#324) and URL blacklisting (#339)

Navigation

  • additional safety check on domain similarity for feeds and sitemaps
  • new function is_live test() using HTTP HEAD request (#327)
  • code parts supported by new courlan version

Maintenance

  • allow urllib3 version 2.0+
  • minor code simplification and fixes

1.5.0

Extraction:

Navigation:

  • transfer URL management to courlan.UrlStore (#232, #312)
  • fixes for spider module

Maintenance:

  • simplify code and extend tests
  • underlying packages htmldate and courlan, update setup and docs

1.4.1

Extraction:

  • XML output improvements with @​knit-bee (#273, #274)
  • extraction bugs fixed (#263, #266), more robust HTML doctype parsing
  • adjust thresholds for link density in paragraphs

... (truncated)

Commits


You can trigger a rebase of this PR by commenting @dependabot rebase.


Dependabot commands and options
You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

Note Automatic rebases have been disabled on this pull request as it has been open for over 30 days.