mediacloud / metadata-lib

How Media Cloud approaches extracting metadata from online news stories
Apache License 2.0
12 stars 5 forks source link

Update trafilatura requirement from <1.7,>=1.4 to >=1.4,<1.9 #84

Closed dependabot[bot] closed 8 months ago

dependabot[bot] commented 8 months ago

Updates the requirements on trafilatura to permit the latest version.

Release notes

Sourced from trafilatura's releases.

trafilatura-1.8.0

Extraction:

  • Better precision by @​felipehertzer (#509, #520)
  • Code formatting in TXT/Markdown output added (#498)
  • Improved CSV output (#496)
  • LXML: compile XPath expressions (#504)
  • Overall speedup about +5%

Downloads and Navigation:

  • More robust scans with is_live_page() (#501)
  • Better sitemap start and safeguards (#503, #506)
  • Fix for headers in response object (#513)

Maintenance:

  • License changed to Apache 2.0
  • Response class: convenience functions added (#497)
  • lxml.html.Cleaner removed (#491)
  • CLI fixes: parallel cores and processing (#524)
Changelog

Sourced from trafilatura's changelog.

1.8.0

Extraction:

  • Better precision by @​felipehertzer (#509, #520)
  • Code formatting in TXT/Markdown output added (#498)
  • Improved CSV output (#496)
  • LXML: compile XPath expressions (#504)
  • Overall speedup about +5%

Downloads and Navigation:

  • More robust scans with is_live_page() (#501)
  • Better sitemap start and safeguards (#503, #506)
  • Fix for headers in response object (#513)

Maintenance:

  • License changed to Apache 2.0
  • Response class: convenience functions added (#497)
  • lxml.html.Cleaner removed (#491)
  • CLI fixes: parallel cores and processing (#524)

1.7.0

Extraction:

  • improved html2txt() function

Downloads:

  • add advanced fetch_response() function → pending deprecation for fetch_url(decode=False)

Maintenance:

1.6.4

Maintenance:

  • MacOS: fix setup, update htmldate and add tests (#460)
  • drop invalid XML element attributes with @​vbarbaresi in #462
  • remove cyclic imports (#458)

Navigation:

  • introduce MAX_REDIRECTS config setting and fix urllib3 redirect handling by @​vbarbaresi in #461
  • improve feed detection (#457)

Documentation:

... (truncated)

Commits


Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


Dependabot commands and options
You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
dependabot[bot] commented 8 months ago

Looks like trafilatura is no longer updatable, so this is no longer needed.