Closed jayaddison closed 2 months ago
- [x] Add deprecation warnings and/or other migration encouragements to the legacy
scrape_me
method.
See #1079.
- [x] Figure out a plan given that the signature of
scrape_html
could change (must it? is there a better way?)
Currently the behaviour changes are that:
**options
keyword arguments are no-longer supported.online
option is added to provide a migration path.A more cautious, although slower, migration approach could be to retain the scrape_me
function in the v15
public API, but to do two things:
online
parameter for the call to work.That would allow us to undo the change to add an online
argument to scrape_html
; it didn't seem great to provide it there purely for migration purposes.
Note: the online
argument in scrape_me
would also require the [online]
extra (e.g. requests
dependency) to be installed, similar to the way it is currently for scrape_html
.
- [x] Add some examples of advanced / alternate HTTP client usage to our documentation.
For this, I'd suggest we could re-use or adapt some of the documentation examples from #1021.
- [x] Figure out a plan given that the signature of
scrape_html
could change (must it? is there a better way?)Currently the behaviour changes are that:
**options
keyword arguments are no-longer supported.An
online
option is added to provide a migration path.A more cautious, although slower, migration approach could be to retain the
scrape_me
function in thev15
public API, but to do two things:
Firstly, require using an added
online
parameter for the call to work.Secondly, also emit a deprecation warning.
That would allow us to undo the change to add an
online
argument toscrape_html
; it didn't seem great to provide it there purely for migration purposes.Note: the
online
argument inscrape_me
would also require the[online]
extra (e.g.requests
dependency) to be installed, similar to the way it is currently forscrape_html
.
Although I'd welcome more feedback, I think that removing the scrape_me
function is OK. Some API breakage is acceptable with a major version bump, and we do want to encourage people to retrieve the HTML themselves.
Providing the online
flag in scrape_html
is already something of a concession for the sake of backwards compatibility, so I've added a deprecation warning when it is used, and have updated the relevant code example in the v15
migration guide to recommend using requests
as an HTTP client to retrieve the HTML instead.
- [x] Add some examples of advanced / alternate HTTP client usage to our documentation.
For this, I'd suggest we could re-use or adapt some of the documentation examples from #1021.
Instead I've opted for a fairly minimal approach to handle this, adding a couple of references to the existing HTML-retrieve example in the readme (#1085).
- [x] Add deprecation warnings and/or other migration encouragements to the legacy
scrape_me
method.
I think that this task should be considered incomplete; the V15 branch removes the scrape_me
interface entirely -- so we should add a V14 deprecation warning about that.
I think it's time to start preparing for a v15 release based on the development branch (#746).
So far there has been one release candidate,
15.0.0-rc1
.The most significant change in that release is that retrieval of HTML from the web is no longer handled within this library by using
requests
; instead HTML is provided directly by the caller using thescrape_html
method (scrape_me
is removed).The
scrape_html
method exists within the mainline/v14
codebase already -- but we haven't yet encouraged migration to it (for example, by raisingDeprecationWarning
s usingscrape_me
).Although the original suggestion was to remove networking code entirely, I'm currently inclined to allow callers to install
requests
as an optional dependency usingpip install recipe-scrapers[online]
, as currently prepared in #1021. That should ease migration for folks who are fine with continuing to userequests
implicitly by default, while allowing for other more advanced use cases and alternative clients.It'd also be good to provide documentation and examples to show other ways of retrieving and passing HTML data for scraping. I'd suggest (and will begin) doing this in preparation for a
15.0.0-rc2
release.scrape_me
method.scrape_html
could change (must it? is there a better way?)