microsoft / pylance-release

Documentation and issues for Pylance
Creative Commons Attribution 4.0 International
1.7k stars 767 forks source link

Bundled type hints for lxml are inconsistent with source, lxml-stubs #5827

Open ferdnyc opened 5 months ago

ferdnyc commented 5 months ago

Environment data

Code Snippet

Borrowing a chunk of Microsoft's own sample XML file...

from lxml import etree

s = """
<catalog>
   <book id="bk101">
      <author>Gambardella, Matthew</author>
      <title>XML Developer's Guide</title>
      <genre>Computer</genre>
      <price>44.95</price>
      <publish_date>2000-10-01</publish_date>
      <description>An in-depth look at creating applications 
      with XML.</description>
   </book>
   <book id="bk102">
      <author>Ralls, Kim</author>
      <title>Midnight Rain</title>
      <genre>Fantasy</genre>
      <price>5.95</price>
      <publish_date>2000-12-16</publish_date>
      <description>A former architect battles corporate zombies, 
      an evil sorceress, and her own childhood to become queen 
      of the world.</description>
   </book>
</catalog>
"""

t = etree.fromstring(s)

Repro Steps

  1. Enter the preceding code
  2. Pylance will flag the etree.fromstring(s) call

Expected behavior

The syntax is considered correct.

Actual behavior

Pylance's complaint is: Argument missing for parameter "parser"

Logs

I don't get the impression there's anything actually useful here, but...

2024-04-29 21:30:36.622 [debug] Found cached env for /tmp/venv/bin/python
2024-04-29 21:30:36.622 [debug] Found cached env for /tmp/venv/bin/python
2024-04-29 21:30:36.622 [debug] Found cached env for /tmp/venv/bin/python
2024-04-29 21:30:37.384 [debug] Extension ms-python.vscode-pylance accessed onDidEnvironmentVariablesChange with args: undefined
2024-04-29 21:30:37.387 [debug] Extension ms-python.vscode-pylance accessed onDidChangeActiveEnvironmentPath with args: undefined
2024-04-29 21:30:40.575 [debug] Found cached env for /tmp/venv/bin/python
2024-04-29 21:30:40.576 [debug] Found cached env for /tmp/venv/bin/python
2024-04-29 21:30:40.939 [info] Starting Pylance language server.
2024-04-29 21:30:40.944 [debug] Found cached env for /tmp/venv/bin/python
2024-04-29 21:30:40.947 [debug] Terminal shell path '/usr/bin/zsh' identified as shell 'zsh'
2024-04-29 21:30:40.947 [debug] Shell identified as zsh 
2024-04-29 21:30:40.948 [debug] Found cached env for /tmp/venv/bin/python
2024-04-29 21:30:40.973 [debug] Found cached env for /tmp/venv/bin/python
2024-04-29 21:30:40.976 [debug] Found cached env for /tmp/venv/bin/python
2024-04-29 21:30:41.076 [debug] Extension ms-python.vscode-pylance accessed getActiveEnvironmentPath with args: undefined
2024-04-29 21:30:41.076 [debug] Extension ms-python.vscode-pylance accessed resolveEnvironment with args: {"id":"/tmp/venv/bin/python","path":"/tmp/venv/bin/python"}
2024-04-29 21:30:41.078 [debug] Extension ms-python.vscode-pylance accessed getEnvironmentVariables with args: undefined
2024-04-29 21:30:41.079 [debug] Extension ms-python.vscode-pylance accessed getActiveEnvironmentPath with args: undefined
2024-04-29 21:30:41.079 [debug] Extension ms-python.vscode-pylance accessed resolveEnvironment with args: {"id":"/tmp/venv/bin/python","path":"/tmp/venv/bin/python"}

Additional information

The issue appears to be that the type hints bundled with Pylance — in the extension's folder, dist/bundled/native-stubs/lxml/etree.pyi — are simply incorrect. Many of the hints are at odds with the function descriptions found right beneath them in the functions' docstrings. For fromstring, for example, the file contains:

def fromstring(text, parser) -> typing.Any:
    'fromstring(text, parser=None, base_url=None)\n\n    Parses an XML document or fragment from a string.  Returns the\n    root node (or the result returned by a parser target).\n\n    To override the default parser with a different parser you can pass it to\n    the ``parser`` keyword argument.\n\n    The ``base_url`` keyword argument allows to set the original base URL of\n    the document to support relative Paths when looking up external entities\n    (DTD, XInclude, ...).\n    '
    ...

It's missing the base_url parameter entirely, and doesn't indicate that the parser parameter has a default value, which is the cause of the incorrect annotation. fromstring is far from the only mistyped function in the file.

I don't know if it's because lxml.etree is implemented in Cython, but I do know that the fromstring function signature hasn't changed in 17 years.

Installing the external lxml stubs with python3 -m pip install lxml-stubs makes the error go away, because in those stubs the fromstring function is typed like so:

@overload
def fromstring(
    text: _AnyStr,
    parser: None = ...,
    *,
    base_url: _AnyStr = ...,
) -> _Element: ...
@overload
def fromstring(
    text: _AnyStr,
    parser: _AnyParser = ...,
    *,
    base_url: _AnyStr = ...,
) -> Union[_Element, Any]: ...
ferdnyc commented 5 months ago

(Actually etree doesn't like having <?xml version="1.0"?> at the start of the string, so I took that out. This way, it parses without issue.)

debonte commented 5 months ago

Seems like we should eliminate our lxml stubs and point people to https://github.com/lxml/lxml-stubs, or https://github.com/abelcheung/types-lxml may actually be preferred.

BradLucky commented 5 months ago

I just ran into this issue as well. Thanks to OP for such a great write-up of the problem and rundown of what's going on.