openlawlibrary / pygls

A pythonic generic language server
https://pygls.readthedocs.io/en/latest/
Apache License 2.0
581 stars 103 forks source link

pygls only understands the file URI scheme. #209

Closed pyscripter closed 2 weeks ago

pyscripter commented 3 years ago

As per title. More specifically it does not handle properly the untitled scheme used by VS-Code. The URI for unsaved buffers in VS-Code is "untitled:buffername". But this is not handled properly by pygls.

See the debug output below for an example:

DEBUG:pygls.protocol:Received b'Content-Length: 151\r\nContent-Type: application/json; charset=utf-8\r\n\r\n{"jsonrpc": "2.0", "method": "textDocument/didOpen", "params": {"textDocument":{"uri":"untitled:module1","languageId":"python","version":0,"text":""}}}
DEBUG:pygls.protocol:Received b'Content-Length: 160\r\nContent-Type: application/json; charset=utf-8\r\n\r\n{"jsonrpc": "2.0", "id": 6, "method": "textDocument/definition", "params": {"textDocument":{"uri":"untitled:module1"},"position":{"line":15,"character":6}}}
DEBUG:pygls.protocol:Request message received.
INFO:pygls.protocol:Sending data: {"jsonrpc": "2.0", "id": 6, "result": [{"uri": "file:///C:/Delphi/Progs32/Python/Pyscripter/module1", "range": {"start": {"line": 11, "character": 4}, "end": {"line": 11, "character": 8}}}]}

As you can see the URI was changed by pygls from "untitled:module1" to "file:///C:/Delphi/Progs32/Python/Pyscripter/module1" The scheme was changed to file and current directory was prepended to the buffer name.

Finally other URI schemes like ssh or for example file://192.168.10.20/f$/MyDir/SubDir/text.doc would face similar problems.

tombh commented 1 year ago

Sorry for the late reply! Over a year 😬

So if this is still an issue, I assume the source of it is in: https://github.com/openlawlibrary/pygls/blob/master/pygls/uris.py

Particularly this method:

def to_fs_path(uri):
    """Returns the filesystem path of the given URI.

    Will handle UNC paths and normalize windows drive letters to lower-case.
    Also uses the platform specific path separator. Will *not* validate the
    path for invalid characters and semantics.
    Will *not* look at the scheme of this URI.
    """
    try:
        # scheme://netloc/path;parameters?query#fragment
        scheme, netloc, path, _params, _query, _fragment = urlparse(uri)

        if netloc and path and scheme == 'file':
            # unc path: file://shares/c$/far/boo
            value = f'//{netloc}{path}'

        elif RE_DRIVE_LETTER_PATH.match(path):
            # windows drive letter: file:///C:/far/boo
            value = path[1].lower() + path[2:]

        else:
            # Other path
            value = path

        if IS_WIN:
            value = value.replace('/', '\\')

        return value
    except TypeError:
        return None

I've written a failing test here, does that encapsulate the issue do you think?

pyscripter commented 1 year ago

Thanks for getting back to me. Yes the test looks good. Unknown schemes should be left untouched.

tombh commented 1 year ago

Ok, I'll look into fixing this soon then.

pyscripter commented 1 year ago

Thank you. Please also reopen my other issue #211.

tombh commented 1 year ago

Reopened it.

pyscripter commented 5 months ago

https://github.com/openlawlibrary/pygls/commit/19033c94beb8ca37511d771cac6c0d064205dce0 never made it to the main branch, and after more than a year there is no progress in resolving this.

I guess all it would take is:

def to_fs_path(uri):
    """Returns the filesystem path of the given URI.

    Will handle UNC paths and normalize windows drive letters to lower-case.
    Also uses the platform specific path separator. Will *not* validate the
    path for invalid characters and semantics.
    Will *not* look at the scheme of this URI.
    """
    try:
        # scheme://netloc/path;parameters?query#fragment
        scheme, netloc, path, _params, _query, _fragment = urlparse(uri)

        if scheme != "file":
           return(uri)

      ...
alcarney commented 5 months ago

after more than a year there is no progress in resolving this.

Apologies! No idea how I missed this issue! :facepalm:

I don't think to_fs_path is the whole story... while it does modify the untitled URI it doesn't change it as dramatically as the logs show in your original post.

>>> from pygls.uris import to_fs_path
>>> to_fs_path('untitled:module1')
'module1'

I guess all it would take is:

If anything I'd argue we should have to_fs_path return None for URIs other than file. Since it only makes sense to call to_fs_path for file URIs this function should clearly indicate to the calling code that it should not be treating the URI as a regular file.

Are you able to share the code corresponding to textDocument/definition implementation from your original post? I am unfortunately, unable to reproduce this issue using our example goto.py server

image

pyscripter commented 5 months ago

Thanks for getting back to me on this.

You are right this is not quite as simple. The issue comes from the jedi-language-server, when it returns a location inside a file (e.g. go to definition).

If you are familiar with jedi, it defines a module using:

def script(project: Optional[Project], document: Document) -> Script:
    """Simplifies getting jedi Script."""
    return Script(code=document.source, path=document.path, project=project)

The document path is converted by jedi to Path(document.path).absolute()

As you can see the document path is used and then in

def lsp_location(name: Name) -> Optional[Location]:
    """Get LSP location from Jedi definition."""
    module_path = name.module_path
    if module_path is None:
        return None

    lsp = lsp_range(name)
    if lsp is None: 
        return None

    return Location(uri=module_path.as_uri(), range=lsp)

and the as_uri() pathlib method is used giving rise to "file:///C:/Delphi/Progs32/Python/Pyscripter/module1" in the original example.

So it is mostly an issue with jedi and the jedi-language-server. I agree on pygls side to_fs_path should probably return None for unknown schemes, but I guess this might break some existing code.

You may close the issue or else reconsider how to_fs_path handles unknown schemes.