hhursev / recipe-scrapers

Python package for scraping recipes data
MIT License
1.7k stars 518 forks source link

v15 prerelease: support networking breaking-change migration. #1022

Closed jayaddison closed 2 months ago

jayaddison commented 6 months ago

I think it's time to start preparing for a v15 release based on the development branch (#746).

So far there has been one release candidate, 15.0.0-rc1.

The most significant change in that release is that retrieval of HTML from the web is no longer handled within this library by using requests; instead HTML is provided directly by the caller using the scrape_html method (scrape_me is removed).

The scrape_html method exists within the mainline/v14 codebase already -- but we haven't yet encouraged migration to it (for example, by raising DeprecationWarnings using scrape_me).

Although the original suggestion was to remove networking code entirely, I'm currently inclined to allow callers to install requests as an optional dependency using pip install recipe-scrapers[online], as currently prepared in #1021. That should ease migration for folks who are fine with continuing to use requests implicitly by default, while allowing for other more advanced use cases and alternative clients.

It'd also be good to provide documentation and examples to show other ways of retrieving and passing HTML data for scraping. I'd suggest (and will begin) doing this in preparation for a 15.0.0-rc2 release.

jayaddison commented 5 months ago
  • [x] Add deprecation warnings and/or other migration encouragements to the legacy scrape_me method.

See #1079.

  • [x] Figure out a plan given that the signature of scrape_html could change (must it? is there a better way?)

Currently the behaviour changes are that:

A more cautious, although slower, migration approach could be to retain the scrape_me function in the v15 public API, but to do two things:

  1. Firstly, require using an added online parameter for the call to work.
  2. Secondly, also emit a deprecation warning.

That would allow us to undo the change to add an online argument to scrape_html; it didn't seem great to provide it there purely for migration purposes.

Note: the online argument in scrape_me would also require the [online] extra (e.g. requests dependency) to be installed, similar to the way it is currently for scrape_html.

  • [x] Add some examples of advanced / alternate HTTP client usage to our documentation.

For this, I'd suggest we could re-use or adapt some of the documentation examples from #1021.

jayaddison commented 5 months ago
  • [x] Figure out a plan given that the signature of scrape_html could change (must it? is there a better way?)

Currently the behaviour changes are that:

  • **options keyword arguments are no-longer supported.

  • An online option is added to provide a migration path.

A more cautious, although slower, migration approach could be to retain the scrape_me function in the v15 public API, but to do two things:

  1. Firstly, require using an added online parameter for the call to work.

  2. Secondly, also emit a deprecation warning.

That would allow us to undo the change to add an online argument to scrape_html; it didn't seem great to provide it there purely for migration purposes.

Note: the online argument in scrape_me would also require the [online] extra (e.g. requests dependency) to be installed, similar to the way it is currently for scrape_html.

Although I'd welcome more feedback, I think that removing the scrape_me function is OK. Some API breakage is acceptable with a major version bump, and we do want to encourage people to retrieve the HTML themselves.

Providing the online flag in scrape_html is already something of a concession for the sake of backwards compatibility, so I've added a deprecation warning when it is used, and have updated the relevant code example in the v15 migration guide to recommend using requests as an HTTP client to retrieve the HTML instead.

jayaddison commented 5 months ago
  • [x] Add some examples of advanced / alternate HTTP client usage to our documentation.

For this, I'd suggest we could re-use or adapt some of the documentation examples from #1021.

Instead I've opted for a fairly minimal approach to handle this, adding a couple of references to the existing HTML-retrieve example in the readme (#1085).

jayaddison commented 2 months ago
  • [x] Add deprecation warnings and/or other migration encouragements to the legacy scrape_me method.

I think that this task should be considered incomplete; the V15 branch removes the scrape_me interface entirely -- so we should add a V14 deprecation warning about that.