Closed thomasjm closed 3 years ago
Thanks for opening this issue!
Note: I've chosen to "resolve" the issue by explicitly declaring the MarkupKind to be "plaintext". Since Python docstring convention varies widely across projects, I've decided that attempting to convert all docstrings into Markdown would be futile. See the relevant part of the LSP spec here: https://microsoft.github.io/language-server-protocol/specification#markupContent
My decision regarding the task's futility comes from some light prototyping on my end and my reading of this issue (along with several others related to markdown-conversion complexity): https://github.com/microsoft/python-language-server/issues/613
Since Python docstrings are generally plaintext, I think this is a reasonable solution for now. If you come up with any ideas that can be implemented simply, safely, and without any bloated 3rd party dependencies, I'd definitely consider them!
After doing some manual experimentation with my language client, it looks like I can easily do the following: return a markdown result, rendering the function signature wrapped in a python language fence and the docstring itself wrapped in plain text.
I'll spend a bit of time making sure we stay compatible with language clients that may not prefer markdown.
I think this is the simplest compromise while still benefiting somewhat from markdown syntax highlighting capabilities
I did implement the above, but my implementation didn't provide much eye candy, but it did introduce some additional complexity to the codebase. Given Python's lack of standard docstrings / everything being an object and all, the implementation mentioned here https://github.com/pappasam/jedi-language-server/issues/22#issuecomment-629852656 and already released is probably the least error-prone / most correct we can achieve. Happy to review PR's, and please open another issue if you can think of a good path forward involving markdown.
Hi @pappasam , thanks for jumping on this so quickly. It's definitely a tricky one due to good RST to Markdown converters being hard to find.
A few thoughts:
numpy
and pandas
all seem to use it from what I've seen.jedi-language-server
to accept a parameter which is a path to a pandoc
binary, which can be used to shell out to do the conversion. It's a large and also GPL-ed dependency so I don't think you could bundle it directly, but at least users would be able to use it if they already have it.Here's a commit showing how I modified it to use pandoc
when available; seems to work fine:
https://github.com/thomasjm/jedi-language-server/commit/b8f760912f4958e2641a036f8f879ddb4970a343
@thomasjm thanks for the example! I'll meditate more on this. In general, to me, bad syntax highlighting is worse than none at all. I had previously disabled all syntax highlighting for hover windows in my client, but this issue helped me figure out how to special case just Python to plaintext (so thank you!).
That said, worst case, I'll make things configurable for you with an initializationOption so both of us will be happy. Best case, the pandoc solution or something like it will work!
Sounds good!
There's a little more progress on my branch which you can see here https://github.com/pappasam/jedi-language-server/compare/master...thomasjm:master
@thomasjm update: version 0.15.0 supports markdown by default if your client supports it, but is configurable to use plaintext. Basically brings back the pre-0.14 behavior that you preferred. Should be a decent band-aid until we figure out the right markdown transformation solution.
Thanks! FYI I've been playing with Pandoc a bit more and found a few issues involving inconsistencies with how Pandoc handles the RST in different packages. I'm trying to tweak it to behave well in all cases. Interesting things to compare include
import numpy
df = pandas.DataFrame()
and seeing how things like RST doctest blocks render, as well as more complex literal blocks.
Yes, I've noticed the same thing. Since it's quite challenging to identify whether a docstring is formatted in rst, markdown, or "other", and pandoc does not appear to have a "auto-language-detection" feature, using pandoc would require lots of special casing on a library-by-library basis. I think that would introduce lots of churn to this project. Whichever solution we end up on, it shouldn't involve knowing which 3rd party library we're providing a Hover response for.
This leaves the "direct translation of the Microsoft function" as our most viable candidate, right? IMHO, our goal here is to create one function that takes any string and converts aspects within it that are identified as RST that really muck up markdown formatting and returns a cleaned string ready for markdown rendering.
This function can live in its own module with tons of helper functions, but as long as the final interface is Callable[[str], str]
, I should be able to easily swap it into the language server.
This library could be promising: https://github.com/dadadel/pyment
This leaves the "direct translation of the Microsoft function" as our most viable candidate, right?
I don't know, I have a feeling this will never be 100% correct--a casual scan of the issues on the MS language server showed difficulties parsing more complex RST constructs. It doesn't make any effort to auto-detect the docstring's actual format, it just tries to parse something RST-like. I kind of think of it as a buggy/incomplete version of what Pandoc does, albeit maybe better tuned for docstring issues. And potentially a maintenance nightmare to keep tuning.
(I remain confused about the discrepancies between mainstream libraries. Even between Numpy and Pandas, when Pandas says that its own format is "based on the Numpy Docstring Standard"...)
On the other hand, https://github.com/dadadel/pyment looks really good! Especially the part where it says it auto-detects different styles.
@thomasjm I spent a good chunk of my day yesterday trying different tools / tricks to resolve this issue and have decided this is too challenging to solve at present.
Things I've tried:
Based on my efforts, I think the diversity of docstring in Python projects is too great to solve this in general. Going forward, I'll spend my energy resolving other issues, but I'm open to reviewing any PR's that you think might generally solve the problem!
@pappasam got it -- I was thinking of taking a stab at pyment
, can you tell me what you tried? Thanks!
For pyment, used this class: https://github.com/dadadel/pyment/blob/master/pyment/docstring.py#L1163
You'll need to pass the lines of code, using get_line_code
method on the name returned from jedi_script.help
, along with anything else you'd like to pass to the constructor. For example, you can experiment with the different output_style
options, passing docs_raw
(or not), etc. The new docstring can be generated / retrieved with the get_raw_docs
method.
Note: none of the above got me close to solving this issue in a general sense, so valid solutions (if they exist, which is dubious IMHO) probably will require a different approach.
Ok, I tried pyment and pypandoc too. They are not up to task. I hand-crafted https://github.com/krassowski/docstring-to-markdown for now. It's does the job just about right for >90% of pandas and numpy documentation entries as retrieved by jedi
. It is conservative in the way that if if cannot confirm that it is in reStructuredText that it knows it raises an error. I plan to add Google docstrings handling too. Just wanted to keep you in the loop. I originally written it in typescript but reverted to Python later today.
@krassowski amazing, I can't wait to play around with this!
Please do let me know if you encounter any issues. Here is a work-in-progress for some documentation formatting (it will be rewritten to use the signatures list rather than signatures string which is troublesome, yet I initially went for it as it is being cached by jedi for numpy, pandas and friends).
Some Python docstrings use
reStructuredText
, one notable example being pandas. Here's an example from that link:This language server feeds these docstrings back directly to the client without any transformation into Markdown as expected by the LSP spec, so they don't always render well when you do things like hovers. For example, the code block sections beginning with
>>>
don't look good.The Microsoft language server has special parsing code in it to handle this; see https://github.com/microsoft/python-language-server/blob/master/src/LanguageServer/Impl/Documentation/DocstringConverter.cs.
Note: I copied this with from a very similar issue I filed on https://github.com/palantir/python-language-server/issues/760