python / typeshed

Collection of library stubs for Python, with static types
Other
4.37k stars 1.75k forks source link

Should stubs contain docstrings? #4881

Open gvanrossum opened 3 years ago

gvanrossum commented 3 years ago

You might be interested in this thread: https://mail.python.org/archives/list/python-ideas@python.org/thread/QKJWPZ6XULFWNRFZYNTXXDC26UVQPFNG/#QKJWPZ6XULFWNRFZYNTXXDC26UVQPFNG

hauntsaninja commented 2 years ago

From @Avasam

As a discussion point about docstrings: They're not currently included in typeshed, and I believe that makes sense: IDEs (like the Python Language Server on VSCode), will pickup the docstrings from the source code anyway. Adding them to typeshed would only make files bigger and harder to maintain (ie: having to ensure the stubs docstrings stay in sync with source docstrings).

There are however, two exceptions:

  1. C extension code
  2. Dynamically generated code

OpenCV and pywin32 (third party libraries) being two great examples of it. They are, however, both massive stubs, and adding dosctrings for everything doesn't seem viable (too much of a chore, and makes the files even bigger than they already are.

As a middle ground, docstrings could automatically be added upon upload for modules/classes/methods/functions/etc. that don't have a direct pure python source to reference.

Maybe that's something the Python Language Server could pull dynamically and that's not something for typeshed to care about. Idk about PyCharm.

hauntsaninja commented 2 years ago

From @jakebailey

IDEs (like the Python Language Server on VSCode), will pickup the docstrings from the source code anyway

I'm not working on Pylance/pyright anymore, but I wouldn't use this as a reason to omit docstrings from stubs; the reason that IDEs have to do things like executing code is because end users (rightfully!) want docstrings in their editors.

For the longest time, pyright and Pylance didn't have docstrings for anything in typeshed at all, but users of course in masse requested them, so we invented ways to map stubs back to the underlying python (covering most of the standard library), and I added something that could provide them for the compiled libraries at runtime by running the interpreter (which I believe is now used for any and all things in the standard lib, as it's more accurate than the mapping). Doing that was a lot easier than to get typeshed / the PEPs to change, especially in the timeframe the users expected the feature.

So, it would be sort of a shame to take that as precedent for why stubs shouldn't have docstrings (in and of itself); the only reason IDEs have to run the code is because stubs don't contain docs. There are of course other rationales both ways, but this one in particular is circular.

I now work on TypeScript, which does put docs in its "stubs" (declarations), both for what is like the "standard library" and untyped npm packages in DefinitelyTyped, and it's worked so far quite well (people do send PRs to fix things). But, the languages are different; the buy-in to types is much higher comparatively (which increases the likelihood someone can contribute a fix to DT), and the "standard library" doesn't change all that much (though Node's types are on DT, docs and all, and change frequently).

srittau commented 2 years ago

I would personally like to have docstrings in stubs. This could not only help IDEs, but also human readers. I'm quite often reading the doctrings in Typescript index.d.ts files (Javascript's .pyi equivalent). That said, I can imagine this to be a maintenance nightmare for typeshed.

hauntsaninja commented 2 years ago

Yeah, the "maintenance nightmare" aspect of this is very real to me.

I don't see us doing this unless we have something that can basically completely automatically keep the docstrings in stubs up to date (which to be clear is not infeasible). Even then I'm somewhat worried about e.g. showing docstrings that mismatch the user's package version (although we've made some recent tooling improvements in regards to managing versions). We'd probably also want to do default values, at least for simple literals.

I will say as a user I'm happy with how pylance does docstrings / how responsive the hover is. So while Jake's point about the circularity of decision making is fair, I'm definitely less inclined to take on a big maintenance burden without the tooling to ensure that things stay accurate. Instead of "no docs -> some docs", it feels like we're looking at "some docs -> some docs, best case slightly faster worst case less accurate or version mismatched". Also it's not just about the burden on typeshed maintainers — I've made peace with my laziness — it's also about the burden on maintainers that ship PEP 561 stubs like numpy.

hmc-cs-mdrissi commented 2 years ago

I tend to keep default values in stubs I write, but have never considered docstrings mostly for size/complexity to maintain. Part of it is I also often copy paste method signature from library documentation which has defaults and then fill in type annotations. A smaller thing is if documentation was included I think reading stub definitions themselves would become harder.

I do mostly see defaults of just float/str/None though. Complex type default (say some private callable function) I haven't encountered much and don't think would even work in a stub?

edit: Maybe a simpler phrasing is manual maintenance of defaults feels feasible to me, while docstrings feels like it requires strong tooling to keep in sync.

srittau commented 2 years ago

Defaults would be another welcome addition and indeed quite feasible, I think.

jakebailey commented 2 years ago

So while Jake's point about the circularity of decision making is fair, I'm definitely less inclined to take on a big maintenance burden without the tooling to ensure that things stay accurate.

Totally understandable; I just wanted to point out the "we have to do Y because we don't have X; we don't do X because we have Y" loop 😅

One obvious benefit of the status quo is that the docs that show up are for the Python version that the user is currently using; that's a hard thing to replicate. In TS, the Node types are versioned and people do keep them up to date, but there's a load of infrastructure on the DT side to make that work plus a pretty large group that works on them and automates getting the docs there.

I will also say that Python and TS/JS have an interesting difference; in Python, docs are a part of the core language and exist at runtime, with things like help() to get them. There's no such thing in JS; JSDoc is source only, so they have to come along with the d.ts files. Makes it a little less than an apples-to-apples (though it's still useful to compare IMO).

Avasam commented 2 years ago

I'm on the exact same opinion as everything hauntsaninja said here.

As a python dev / consumer of Typeshed, I'd love to have docstrings where I previously didn't (because Pylance / Python Language Server only gives them for pure python source).

From the maintenance PoV, it doesn't seem like a great idea to have it in the typeshed source code. There's also the concern of having docstrings in typeshed out of date with the version of a library that's actually installed.

docs are a part of the core language and exist at runtime, with things like help() to get them.

Is the main reason I alluded to the idea of relegating this to the stubs uploader (and only when the stubbed element is not present in a python source file to reference), or even to the IDEs to pull that information. (once again I haven't touched PyCharm in years so idk how it is there). Either way stubs would not live in typeshed's source, other than in exceptional cases. At least for 3rd party stubs.

For instance, here's an example of a bad-case scenario with around 2k definitions: https://github.com/Avasam/typeshed/blob/opencv2/stubs/opencv-python/cv2/cv2.pyi (originally: https://github.com/microsoft/python-type-stubs/blob/main/cv2/__init__.pyi)

gvanrossum commented 2 years ago

Unless we have tooling that automatically updates stub docstrings when the docstring changes in the code covered by the stub, I am against this, because of the maintenance burden.

Even if we had such tooling, I actually like the terseness of the stubs -- many modules have very extensive docstrings that would completely overwhelm the current contents of the stubs, and make them less usable (to me).

gramster commented 2 years ago

Unless we have tooling that automatically updates stub docstrings when the docstring changes in the code covered by the stub, I am against this, because of the maintenance burden.

I wrote such a tool; we use it for injecting some docstrings into the pandas stubs we bundle with pylance. At this point it does not update docstrings, it just uses introspection on the module to get the docstrings and then inserts them, but it could be extended. It's the successor to https://github.com/gramster/stubsplit which was our earlier approach.

hmc-cs-mdrissi commented 2 years ago

Using that tool could we have stub packages produced by stub uploader have docstrings added instead of having docstrings in typeshed source? If we can avoid having them in typeshed source that should lower maintenance involved (still non zero to change stub uploader) while keeping IDE benefit.

olejorgenb commented 6 months ago

@gramster It's this project: https://github.com/gramster/docs2stubs I assume?

KotlinIsland commented 5 months ago

Hi 👋 (reference to non-colaborators like @DetachHead not being allowed to comment here)