scrapy / scrapy

Scrapy, a fast high-level web crawling & scraping framework for Python.
https://scrapy.org
BSD 3-Clause "New" or "Revised" License
52.31k stars 10.46k forks source link

Implementation for "abstract" methods causes static type checkers to fail #5836

Open ldorigo opened 1 year ago

ldorigo commented 1 year ago

Description

Hello, the current implementation of "abstract methods" (unimplemented methods on base classes), such as scrapy.http.Response.css(), causes static type checkers such as pylance to fail quite hard - they (correctly) interpret the method as never returning (it only raises an exception), meaning that any further code in the calling module is marked as unreachable.

The problem is described in detail here: https://github.com/microsoft/pylance-release/issues/149 . The long-term solution will be to use proper type annotation/type stubs (for which there are issues and PRs in the works), but for now it would be enough to mark abstract methods as @abstractmethod so that static checkers know that the method is not meant to be called directly (rather it should be implemented in a subclass).

Steps to Reproduce

Install pylance (built-in to vscode) or mypy. Write any code that calls Response.css() (or other similar abstract methods).

Expected behavior: ideally a typed return value, but at least not breaking the entire typechecker.

Actual behavior: see above

Reproduces how often: 100%

Versions

a::b
Scrapy 2.7.1 started (bot: lalibrairie_scraper)
Versions: lxml 4.9.2.0, libxml2 2.9.14, cssselect 1.2.0, parsel 1.7.0, w3lib 2.1.1, Twisted 22.10.0, Python 3.10.9 (main, Dec 19 2022, 17:35:49) [GCC 12.2.0], pyOpenSSL 22.1.0 (OpenSSL 3.0.7 1 Nov 2022), cryptography 38.0.4, Platform Linux-6.1.14-1-lts-x86_64-with-glibc2.37
Scrapy       : 2.7.1
lxml         : 4.9.2.0
libxml2      : 2.9.14
cssselect    : 1.2.0
parsel       : 1.7.0
w3lib        : 2.1.1
Twisted      : 22.10.0
Python       : 3.10.9 (main, Dec 19 2022, 17:35:49) [GCC 12.2.0]
pyOpenSSL    : 22.1.0 (OpenSSL 3.0.7 1 Nov 2022)
cryptography : 38.0.4
Platform     : Linux-6.1.14-1-lts-x86_64-with-glibc2.37    
wRAR commented 1 year ago

Response is not an ABC though, it's used directly. Response.css() is also not an abstract method, implementing it is not required for subclasses. It also doesn't raise NotImplementedError, it raises "Response content isn't text". It makes sense to me that code calling css() on something that is not necessarily a TextResponse is treated as possibly raising an exception (I haven't checked what do static checkers actually report).

ldorigo commented 1 year ago

Hmm, so by changing the type annotation of calling code from foo: Response to foo: TextResponse, the typechecker errors go away. In theory annotating it as Response shouldn't mark the rest of the code as unreachable though, since it is possible that a subclass of response actually returns something rather than raising an exception... Not sure if it's possible to do something about it, I'll make a small issue on pylance to ask the devs there if it's expected

Gallaecio commented 1 year ago

If you are calling foo.css somewhere where you expect a TextResponse, you could use assert isinstance(foo, TextResponse) before your foo.css calls to silence typing errors. At least that usually works with mypy.

wRAR commented 9 months ago

@sa2415 what do you want to do for this issue?