microsoft / vscode

Visual Studio Code
https://code.visualstudio.com
MIT License
164.95k stars 29.53k forks source link

Jupyeter Notebook metadata out of sync as of 1.93 #232528

Open xl0 opened 1 month ago

xl0 commented 1 month ago

Does this issue occur when all extensions are disabled?: Yes/No

I'm working on a tool that allows interacting with LLMs from Jupyter notebooks, and let the LLM create and run cells in your notebook. To provide the context to the LLM, I, among other things, need to find the cell that is currently being executed in the notebook: https://github.com/ConGustoAI/friendlly Image

Before 1.93.1, the execution_count in the saved .ipynb would update immediately when the file is saved. So I can enable autosave with a small (200ms) delay, and find the cell with the execution count that matches the current get_ipython().execution_count.

In my code, I trigger a save by emitting an empty JS output display(JavaScript("")), and the file is saved as the cell is executing.

As of 1.93.1, the execution count, and likely other data/metadata does not seem to get updated in the saved file, as the cell is being executed.

Simplified code to find the exec_count of the cells in the file:

import os
import time
import json
import warnings
from urllib.parse import urlparse
from IPython import get_ipython
from IPython.display import display, Javascript

def vscode_extract_path():
    """
    Extracts the filename from the parent_header of the current notebook.
    """
    cellid = get_ipython().parent_header.get("metadata", {}).get("cellId", '')
    url = urlparse(cellid)
    return url.path

def vscode_execution_count():
    path = vscode_extract_path()

    print("Current exec count: ", get_ipython().execution_count)
    timestamp = time.time()

    display(Javascript("")) # Empty js to kick off autosave.

    # Wait for the file to be saved, up to ~5 seconds.
    for i in range(50):
        last_modified = os.path.getmtime(path)
        if last_modified > timestamp:
            break
        time.sleep(0.1)
    else:
        print("Make sure autosave is set to afterDelay in vscode settings, and the delay is less than a second!")

    with open(path) as f:
        data = json.load(f)
        cells = data.get("cells", [])
        for idx, cell in enumerate(cells):
            print(f"Cell {idx} exec count:", cell.get("execution_count"))

vscode_execution_count()

The code:

In 1.92.2 the count in the file is consistent with the exec count from get_ipython(): Image

In 1.93.1, the count in the file is inconsistent: Image

Note: I'm using the same versions of Python/Jupyter extensions that are compatible with both versions of VSCode: Python: v2024.8.1 Jupyter: v2024.6.0

so it's likely not an issue with the Jupyter extension.

vs-code-engineering[bot] commented 1 month ago

Thanks for creating this issue! It looks like you may be using an old version of VS Code, the latest stable release is 1.95.0. Please try upgrading to the latest version and checking whether this issue remains.

Happy Coding!

xl0 commented 1 month ago

To clarify, the behavior is the same in the latest version of vscode/extensions. I have highlighted the first release that has the issue.