microsoft / vscode-jupyter

VS Code Jupyter extension
https://marketplace.visualstudio.com/items?itemName=ms-toolsai.jupyter
MIT License
1.29k stars 290 forks source link

Broken diagnostics after editing / reordering cells in polyglot notebooks #7308

Closed SGP57 closed 3 years ago

SGP57 commented 3 years ago

Environment data

Expected behaviour

Correct analysis!

Actual behaviour

Many false positives.

I am seeing many error reports, mostly this one but others too. It's reporting this error in comments at the start of the file as well.

Python language is selected.

I have for a new file, selected python and pasted in my 39 lines of program and errors are not reported. I put an test-error into the new file and that was reported successfully, and removed when I removed my test-error .

So, it's not the actual syntax etc. I'm relatively new at this. What assumptions are you making that I wouldn't have predicted? I guess there's all sorts of things about my setup that I could discover and add but this seems a bit fundamental so I'd appreciate suggestions / guidance.

python --version Python 3.8.10

lsb_release -a No LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 20.04.2 LTS Release: 20.04 Codename: focal

Logs

Python Language Server Log

XXX 'Please paste the output from your clipboard'

Code Snippet / Additional information

XXX
erictraut commented 3 years ago

We can't diagnose a problem without seeing your code. Please include a minimal, self-contained code sample that exhibits the problem.

One common cause of the "statements must be separated by newlines" is the use of Python 2 code. Pylance supports only Python 3.x. If you are using Python 2 code, you should choose a different language server like Jedi.

SGP57 commented 3 years ago

Re python version: it's 3.8.10

Re giving sample code that exhibits the problem: that was the whole point of the fault report. The code in the original file did show the false errors but copying and pasting the same code into a new file/tab in vscode did not show the errors. Consequently, showing the false errors was unlikely to happen in code that I submitted. The python program did run successfully when pylance was reporting all the errors.

Within my original file, I have now deleted everything (Ctrl+A, Ctrl+X) and then reinserted (Ctrl+V) and the errors have gone. I'm guessing therefore your diagnosis will be 'not a fault'. There clearly was an issue here - I'm guessing that pylance got a bit confused and thought that you may be interested in that.

I do see that a screenshot would probably have been useful.

The 'Statements must be separated by newlines or semicolons' messages gave locations: sometimes several locations (word boundaries) on the same line in # comment lines. Also, sometimes the location was a few columns beyond the end of the line and there wasn't any white space present there, especially in import statements. There was also a message about an invalid character of (I think it was) '\u21' but nothing unexpected was at that location.

bschnurr commented 3 years ago

Just to re-iterate,

We will need a screenshot or code sample to try to reproduce what you're seeing.

SGP57 commented 3 years ago

Apologies. Not possible.

As I fixed the problem by deleting and pasting the same code, it probably wasn't just that code. I've tried <Ctrl+Z> many times to undo to see if I could recreate but the errors are not showing up, so unfortunately a screenshot isn't possible either now.

I get that it was going to be hard to reproduce. I had thought it might be useful for you to get this report in case you (plural) had seen or do see anything similar.

I understand you will probably close this.

SGP57 commented 3 years ago

Maybe the real point was that the errors that I saw didn't make sense. Several "must be separated" errors about the same comment line and errors that gave locations that didn't exist - just a bit beyond the end of the line.

erictraut commented 3 years ago

I can think of two possible causes:

  1. VS Code opened the file with the wrong encoding (e.g. UTF16 vs UTF8)
  2. There were bad, unprintable characters within the file — perhaps it was corrupted on disk

In either case, I don't think this is a bug in pylance.

SGP57 commented 3 years ago

I see your thinking. I'd agree with your conclusion.

This was a new "program". I had written some python but had also copied and pasted from web examples and python doc and edited. I'd suggest that the file on disk was ok because the python program ran successfully but the pasting from the web might open up possibilities about character encodings. An interesting restriction within vscode if that is the case! However, the cut and paste presumably sorted that out.

Thanks.

geophpherie commented 3 years ago

Not sure that it's too helpful, but sorta related along the lines of "errors that don't make sense". I do at times run into phantom lints, or where it appears pylance thinks text exists in a cell that doesn't. The only way to clear it out is to paste the code into a new cell. It's hard to consistently reproduce, it just sometimes gets into this weird state and i end up reloading the notebook to clear it.

Here are a view images. In this case it is linting comments.

image

image

Here it's linting blank space (px is imported and the cell runs fine).

image

Here it's alerting to something not related to the line of code.

image

jakebailey commented 3 years ago

Have you been able to consistently reproduce it, or is it seemingly random? If you can reproduce it, I'd be pretty in the LSP trace ("python.trace.server": "verbose") to see what the edits are.

geophpherie commented 3 years ago

Once it gets in that state the cells remain that way and as i edit them different phantom lints appear, but i haven't yet been able to figure out what gets it there.

jakebailey commented 3 years ago

Yeah, it's unfortunately the trace up to that point that's the most interesting. I'd think this would be in the vein of a bug in the Jupyter extension, except that the original issue was not in a notebook.

geophpherie commented 3 years ago

Gotcha. Can you clarify this?

LSP trace ("python.trace.server": "verbose")

Is that not something to put in settings.json? I'm happy to pay more attention to it if it happens again.

jakebailey commented 3 years ago

Yes, it would be; it's undocumented so VS Code will squiggle it, but it's working. You'll see a load more output in the Python Language Server output panel.

This may significantly slow down your editor, as it logs big JSON objects for every single thing between the editor and Pylance, but it's worth a try if you can get it to the bad state.

geophpherie commented 3 years ago

ah okay gotcha. Yeah was confused since it said it was invalid. Okay I'll keep in on there and see if I'm able to catch anything! Thanks!

geophpherie commented 3 years ago

Hmm, in theory I've triggered it again.

Is this helpful at all? It seems as though it's "lost" a line in a cell. Hovering over the first line import n1805 triggers the range saying line 0 and thinking it's on line 1 where import numpy as np is?

image

geophpherie commented 3 years ago

Able to see the same error that OP mentioned too.

image

jakebailey commented 3 years ago

It's the full log I'm interested in, as it would contain every file change sent from the client to us so we can see if it's consistent or if our incremental text handling is somehow off (in Pylance or in Jupyter).

Sti2nd commented 3 years ago

Here are more examples of Pylance linting wrong. This is in a .ipynb file in VS Code.

The code: image The lint errors: image image

geophpherie commented 3 years ago

@jakebailey The attached log is from startup to when i noticed an issue present. It's pretty contrived, but it seems to be along the lines of what i see at times in my standard workflow.

My first thought in regards to the conditions needed to generate it is maybe something related to the plotly figures containing so much data that the language server falls behind? Not sure if others that are experiencing it use plotly, but it seems somewhat reproducible for me to have a bunch of large plotly figures showing inline then start editing code to have an error, fix that error, but see the pylance lint as if the error is still there.

Let me know if there's more I can do!

image image

log2.txt

jakebailey commented 3 years ago

In your case, do these messages go away eventually? I recall that plotly is one of those libraries with an incredible number of modules with lots of complicated code, which makes it hard to analyze, so using plotly may be the problem itself and things lag behind as we try to keep up.

We don't know anything about the figures themselves, only the document that the Jupyter extension produces that looks like the notebook concatenated into a single file.

geophpherie commented 3 years ago

They don't seem to go away, but they do seem to migrate. I can delete a cell that has them but the phantom error messages will just move on to the cell that bumps up from the deletion.

I don't know too much about plotly internals, but i do know their figures are essentially JSON objects. So i was thinking that if the representation of the notebook that is sent to the language server contains cell output, it'd have a massive additional JSON object to look through. If you're not looking at cell output though i guess that's irrelevant.

But, for example, if i save an ipynb file and look at it in a text editor, all those data points are in there as a JSON object.

jakebailey commented 3 years ago

We don't work with the cell outputs or the ipynb; we get simulated Python files that contain only Python code. The figures themselves are a red herring, and this would have to be some sort of desync we need to look at the log to try and figure out.

erictraut commented 3 years ago

Thanks for the trace logs. Those were very helpful. I spent some time walking through each message to reconstruct the document one edit at a time.

The "textDocument/publishDiagnostics" input messages look correct to me. They're in the right order, and when I apply the deltas, I reconstruct the text that you show in your screen shot above.

The "textDocument/publishDiagnostics" output messages also look correct to me. They include the diagnostics I would expect to see while you're coding — various syntax errors due to incomplete code. The text ranges for these diagnostics also correspond to the line and column numbers that I would expect from the reconstructed contents.

Subsequent "textDocument/publishDiagnostics" output messages should replace the previously-reported list of diagnostics within the document. That doesn't seem to be happening. For example, the "textDocument/publishDiagnostics" message sent at timestamp 2:52:12 contains the message Expected ":". The subsequent "textDocument/publishDiagnostics" message at timestamp 2:52:13 no longer contains a diagnostic with this message because the intervening edits have eliminated that syntax error condition. The client is supposed to replace older diagnostic lists with an updated list when it receives a newer "textDocument/publishDiagnostics" message.

Based on the evidence, it looks like the problem is in the code that provides support for jupyter notebooks — the code that combines the cells in the notebook and writes the combined Python code to a file for pylance to analyze. I'm guessing that this code is not properly handling the "textDocument/publishDiagnostics" messages that it receives. It should process them in the order they're received, and newer diagnostic lists should replace older lists.

@jakebaily, one thing that I noticed is that pylance is not including a version in the "textDocument/publishDiagnostics" message. The LSP protocol indicates that such a version is not required, but I wonder if the absence of this version is triggering a bug in the jupyter notebook client code.

jakebailey commented 3 years ago

It's very possible that the version would help; it's been something we've wanted to plumb correctly as an improvement.

Given the way that these work, it's very possible that it could be as simple as there being an async step somewhere between the translation of the faked documents back to the cells, such that every now and then, they get scheduled out of order. Versioning would fix that, so long as the translation makes use of them properly (and that, I don't know).

imRajAryan09 commented 3 years ago

Pylance is checking and false flagging syntax errors for the text written in markdown and comments. I am using VS Code for writing a Jupyter Notebook

jakebailey commented 3 years ago

@jbeyer16 Does your notebook happen to have non-Python cells in it? E.g. markdown cells?

geophpherie commented 3 years ago

@jakebailey Yep it did!

jakebailey commented 3 years ago

Aha, then I think that what's happening to everyone here is that when a non-Python cell is edited, the code that handles converting the Python parts of the notebook get confused. It could be that it's sending an invalid change, or is expecting a version number to increase (even though from our POV there were no edits), or is sending the wrong edits due to cell confusion, etc. Good to have a lead.

geophpherie commented 3 years ago

oh that's an interesting insight! I think most of my notebooks typically have Markdown, which explains why i see it a good bit. Glad you've got a place to start now!

roboserg commented 3 years ago

Started a new, empty notebook. Didnt use any markdown cells or jupyter magic commands, still Pylance is confused: Code_EGSGR7YJkG

jakebailey commented 3 years ago

@roboserg Your issue appears unrelated; the diagnostics are not misplaced or incorrect. This typically means that the kernel you've selected for your notebook is not the same as the interpreter you've selected at the bottom left for the Python extension.

roboserg commented 3 years ago

@roboserg Your issue appears unrelated; the diagnostics are not misplaced or incorrect. This typically means that the kernel you've selected for your notebook is not the same as the interpreter you've selected at the bottom left for the Python extension.

Yes, those were my thoughts. I triple checked to have the correct conda env (I even manually re-selected it). Nothing helped till I restarted VS Code. I'd assume re-selecting the proper python conda env would load it, but it did'nt.

jakebailey commented 3 years ago

Ah, well, if you had just installed fastapi, then that is likely microsoft/pylance-release#923, as we don't detect package installations at the moment and require a restart (or some other config change to tell us to reload things).

jakebailey commented 3 years ago

The code for the concat doc appears to live here: https://github.com/microsoft/vscode/blob/main/src/vs/workbench/api/common/extHostNotebookConcatDocument.ts

I have a change that will add the version numbers to the diagnostics (we should do it regardless of the notebook problem), but I have yet to try and reproduce the bug.

roboserg commented 3 years ago

Ah, well, if you had just installed fastapi, then that is likely microsoft/pylance-release#923, as we don't detect package installations at the moment and require a restart (or some other config change to tell us to reload things).

The fastai package was already installed in the conda env (weeks ago). I was able to reproduce the error just now -> start new empty ipynb notebook, select desired conda env, import the package -> PyLance error.

In the video below, the correct conda env was already selected for me (fastchan), but for the purposes of the video I select it again so that you know I didnt forget to select the correct one.

Image - https://i.imgur.com/bkBvczd.png

https://user-images.githubusercontent.com/4758917/130984938-266fcfbc-71d6-4cc7-bb61-a19ceb796ab3.mp4

jakebailey commented 3 years ago

That's the kernel selector provided by the Jupyter extension, but the interpreter that Pylance uses is the one set by the Python extension. I think at the moment this menu gets (confusingly) hidden when you're in a notebook. Try opening up or creating a new Python file, then check the interpreter selected at the bottom left.

jakebailey commented 3 years ago

I was able to reproduce the bug making a notebook with markdown and Python together, then swapping the cells around.

image

Adding versions to the diagnostics unfortunately doesn't fix the problem. I can see in the LSP trace that each swap sends an empty change event that bumps the version without actually changing any of the code, even though it should send a change that reorders things.

At this point, I'm pretty confident that there's nothing we can do on our end to fix this, and this is a bug in either VS Code's concat document or the code that ties it to the language server.

jakebailey commented 3 years ago

The inclusion of non-python code appears to be critical; have all python cells, and the text changes come in as expected when swapping.

[Trace - 11:54:02 AM] Sending notification 'textDocument/didChange'.
Params: {
    "textDocument": {
        "uri": "file:///e%3A/work/demo-apps/ds-1/_NotebookConcat_b0021a48b0e74c0cb97107a8c1ddadde.py",
        "version": 2
    },
    "contentChanges": [
        {
            "range": {
                "start": {
                    "line": 0,
                    "character": 0
                },
                "end": {
                    "line": 5,
                    "character": 0
                }
            },
            "rangeLength": 27,
            "text": "foobar\nimport xyz\r\ndef\r\n\r\n"
        }
    ]
}

[Trace - 11:54:58 AM] Sending notification 'textDocument/didChange'.
Params: {
    "textDocument": {
        "uri": "file:///e%3A/work/demo-apps/ds-1/_NotebookConcat_b0021a48b0e74c0cb97107a8c1ddadde.py",
        "version": 3
    },
    "contentChanges": [
        {
            "range": {
                "start": {
                    "line": 0,
                    "character": 0
                },
                "end": {
                    "line": 5,
                    "character": 0
                }
            },
            "rangeLength": 27,
            "text": "import xyz\r\ndef\r\n\r\n\nfoobar"
        }
    ]
}
jakebailey commented 3 years ago

When the non-python cell is present, then we get that single didChange with no contents on the swap:

[Trace - 11:58:18 AM] Sending notification 'textDocument/didChange'.
Params: {
    "textDocument": {
        "uri": "file:///e%3A/work/demo-apps/ds-1/_NotebookConcat_b0021a48b0e74c0cb97107a8c1ddadde.py",
        "version": 16
    },
    "contentChanges": []
}

Then, any changes after that come in duplicated, where the first is empty (should not be sent), and the second one looks right (but won't work because we didn't get the change for the swap):

[Trace - 11:58:44 AM] Sending notification 'textDocument/didChange'.
Params: {
    "textDocument": {
        "uri": "file:///e%3A/work/demo-apps/ds-1/_NotebookConcat_b0021a48b0e74c0cb97107a8c1ddadde.py",
        "version": 18
    },
    "contentChanges": []
}

[Trace - 11:58:44 AM] Sending notification 'textDocument/didChange'.
Params: {
    "textDocument": {
        "uri": "file:///e%3A/work/demo-apps/ds-1/_NotebookConcat_b0021a48b0e74c0cb97107a8c1ddadde.py",
        "version": 18
    },
    "contentChanges": [
        {
            "range": {
                "start": {
                    "line": 4,
                    "character": 6
                },
                "end": {
                    "line": 4,
                    "character": 6
                }
            },
            "rangeLength": 0,
            "text": "a"
        }
    ]
}
jakebailey commented 3 years ago

Is this fixed or was it closed mistakenly? I haven't seen a code change that would have fixed this in the concat document or elsewhere...

rchiodo commented 3 years ago

Hopefully this is a dupe of https://github.com/microsoft/vscode-jupyter/issues/6608. We closed awaiting verification.