microsoft / vscode-jupyter

VS Code Jupyter extension
https://marketplace.visualstudio.com/items?itemName=ms-toolsai.jupyter
MIT License
1.27k stars 284 forks source link

Crashes with Period column or index #14048

Open davidgilbertson opened 1 year ago

davidgilbertson commented 1 year ago

Environment data

I'm using Pandas 2.0.3. This happens in a WSL setup as well as just running in Windows.

I didn't explicitly install Jupyter or IPython, I just let VS Code do whatever it had to do (install i-something-kernel to run the interactive window).

Expected behaviour

I should be able to view a DataFrame with the Period type in it (column or index).

Actual behaviour

The data viewer crashes

Steps to reproduce:

  1. Create a new project, create a venv environment, install pandas.
  2. Create a file test.py with the contents:
    
    import pandas as pd

df = pd.DataFrame( dict( A=[1, 2, 3], Date=pd.period_range("2000-01-01", periods=3), ) )


In VS Code, do `Run Current File in Interactive Window`. Confirm the dataframe shows just fine in the interactive window:
![image](https://github.com/microsoft/vscode-jupyter/assets/4443482/424032f2-f5f9-462b-bbc7-913e717a5343)

Click `Variables` in the interactive window, and try to view the `df` variable in the data viewer. Then it crashes.

I found this issue https://github.com/microsoft/vscode-jupyter/issues/10446 but it was closed as a Pandas issue.

## Logs

These logs are when testing with WSL, they're much the same with plain Windows.

<details>

<summary>Output for <code>Jupyter</code> in the <code>Output</code> panel (<code>View</code>→<code>Output</code>, change the drop-down the upper-right of the <code>Output</code> panel to <code>Jupyter</code>)
</summary>

<p>

Visual Studio Code (1.80.2, wsl, desktop) Jupyter Extension Version: 2023.6.1101941928. Python Extension Version: 2023.12.0. Platform: linux (x64). Workspace folder /mnt/c/Users/david/py/test3, Home = /home/davidg 15:19:20.274 [info] User belongs to experiment group 'FastKernelPicker' 15:19:20.274 [info] User belongs to experiment group 'NewRemoteUriStorage' 15:19:20.274 [info] User belongs to experiment group 'PasswordManager' 15:19:20.391 [info] Start refreshing Kernel Picker (1691039960391) 15:19:20.398 [info] Using Pylance 15:19:20.809 [info] Start refreshing Interpreter Kernel Picker 15:19:21.042 [info] Loading webview. View is notset 15:19:21.048 [info] Loading web view... 15:19:21.051 [info] Webview panel created. 15:19:21.553 [info] Web view react rendered 15:19:21.973 [info] Process Execution: /mnt/c/Users/david/py/test3/test3/bin/python -m pip list 15:19:23.191 [info] End refreshing Kernel Picker (1691039960391) 15:19:55.951 [info] Starting Kernel startUsingPythonInterpreter, .jvsc74a57bd02566ab5dbfd06e0d22b37f5de0f2fde6a8aff4465388f8f45199a2a85efd0a80./mnt/c/Users/david/py/test3/test3/python./mnt/c/Users/david/py/test3/test3/python.-m#ipykernel_launcher (Python Path: /mnt/c/Users/david/py/test3/test3/bin/python, Venv, test3, 3.11.2) for '/Interactive-2.interactive' (disableUI=false) 15:19:55.972 [info] Process Execution: /mnt/c/Users/david/py/test3/test3/bin/python -c "import ipykernel; print(ipykernel.version); print("5dc3a68c-e34e-4080-9c3e-2a532b2ccb4d"); print(ipykernel.file)" 15:19:56.048 [info] Process Execution: /mnt/c/Users/david/py/test3/test3/bin/python -m ipykernel_launcher --ip=127.0.0.1 --stdin=9003 --control=9001 --hb=9000 --Session.signature_scheme="hmac-sha256" --Session.key=b"49a4eeb6-753b-4dc0-a177-ab4fb519c177" --shell=9002 --transport="tcp" --iopub=9004 --f=~/.local/share/jupyter/runtime/kernel-v2-52220A4kl18ZNWbp.json

cwd: /mnt/c/Users/david/py/test3 15:19:56.989 [info] ipykernel version & path 6.25.0, /mnt/c/Users/david/py/test3/test3/lib/python3.11/site-packages/ipykernel/init.py for /mnt/c/Users/david/py/test3/test3/bin/python 15:20:02.687 [warn] StdErr from Kernel Process 0.03s - Debugger warning: It seems that frozen modules are being used, which may 0.00s - make the debugger miss breakpoints. Please pass -Xfrozen_modules=off 0.00s - to python to disable frozen modules. 0.00s - Note: Debugging will proceed. Set PYDEVD_DISABLE_FILE_VALIDATION=1 to disable this validation. 15:20:04.541 [warn] StdErr from Kernel Process /mnt/c/Users/david/py/test3/test3/lib/python3.11/site-packages/traitlets/traitlets.py:2548: FutureWarning: Supporting extra quotes around strings is deprecated in traitlets 5.0. You can use 'hmac-sha256' instead of '"hmac-sha256"' if you require traitlets >=5. warn( 15:20:04.542 [warn] StdErr from Kernel Process /mnt/c/Users/david/py/test3/test3/lib/python3.11/site-packages/traitlets/traitlets.py:2499: FutureWarning: Supporting extra quotes around Bytes is deprecated in traitlets 5.0. Use '49a4eeb6-753b-4dc0-a177-ab4fb519c177' instead of 'b"49a4eeb6-753b-4dc0-a177-ab4fb519c177"'. warn( 15:20:04.694 [info] Started Kernel test3 (Python 3.11.2) (pid: 5377) 15:20:04.694 [info] Started new session 693ccb94-5139-4cfc-bccd-64dcdc7a610a 15:20:04.728 [info] Process Execution: /mnt/c/Users/david/py/test3/test3/bin/python ~/.vscode-server/extensions/ms-toolsai.jupyter-2023.6.1101941928-linux-x64/pythonFiles/printJupyterDataDir.py 15:20:04.731 [info] Generated code for 1 = with 8 lines 15:20:04.747 [info] Kernel acknowledged execution of cell 4 @ 1691040004743 15:20:08.826 [info] End cell 4 execution @ 1691040008824, started @ 1691040004743, elapsed time = 4.081s 15:20:12.265 [info] Loading webview. View is notset 15:20:12.266 [info] Loading web view... 15:20:12.266 [info] Webview panel created. 15:20:12.275 [info] Process Execution: /mnt/c/Users/david/py/test3/test3/bin/python -c "import pandas;print(pandas.version)" 15:20:12.422 [info] Web view react rendered 15:20:16.149 [error] [Error: Failed to fetch variable info from the Jupyter server. at Qv.extractJupyterResultText (~/.vscode-server/extensions/ms-toolsai.jupyter-2023.6.1101941928-linux-x64/out/extension.node.js:24:141926) at Qv.deserializeJupyterResult (~/.vscode-server/extensions/ms-toolsai.jupyter-2023.6.1101941928-linux-x64/out/extension.node.js:24:142011) at Qv.getDataFrameRows (~/.vscode-server/extensions/ms-toolsai.jupyter-2023.6.1101941928-linux-x64/out/extension.node.js:24:140124) at async bE.getRows (~/.vscode-server/extensions/ms-toolsai.jupyter-2023.6.1101941928-linux-x64/out/extension.node.js:24:585900) at async ~/.vscode-server/extensions/ms-toolsai.jupyter-2023.6.1101941928-linux-x64/out/extension.node.js:24:574006 at async iE.wrapRequest (~/.vscode-server/extensions/ms-toolsai.jupyter-2023.6.1101941928-linux-x64/out/extension.node.js:24:574283)] 15:20:16.150 [warn] DataScience Error [Error: Failed to fetch variable info from the Jupyter server. at Qv.extractJupyterResultText (~/.vscode-server/extensions/ms-toolsai.jupyter-2023.6.1101941928-linux-x64/out/extension.node.js:24:141926) at Qv.deserializeJupyterResult (~/.vscode-server/extensions/ms-toolsai.jupyter-2023.6.1101941928-linux-x64/out/extension.node.js:24:142011) at Qv.getDataFrameRows (~/.vscode-server/extensions/ms-toolsai.jupyter-2023.6.1101941928-linux-x64/out/extension.node.js:24:140124) at async bE.getRows (~/.vscode-server/extensions/ms-toolsai.jupyter-2023.6.1101941928-linux-x64/out/extension.node.js:24:585900) at async ~/.vscode-server/extensions/ms-toolsai.jupyter-2023.6.1101941928-linux-x64/out/extension.node.js:24:574006 at async iE.wrapRequest (~/.vscode-server/extensions/ms-toolsai.jupyter-2023.6.1101941928-linux-x64/out/extension.node.js:24:574283)]

DonJayamanne commented 1 year ago

Thansk I can replicate this issue, basically it comes down to df.to_json()) failing in Python. Note: for internal use, Data Wranger works here.

davidgilbertson commented 1 year ago

Cool, thanks for that. FYI I've added a comment here in the Pandas repo, not sure if it's on their radar.

One fix might be to be to convert all periods to date time, but you'd need to handle the case of a multi-index where only one of the indexes is Period (my use case and a common-ish format in hierarchical time series analysis).

With very little testing, this seems to work...

import pandas as pd

df = pd.DataFrame(
    dict(
        A=[1, 2, 12],
        B=[1, 2, 12],
        C=[1, 2, 12],
        Date1=pd.period_range("2000-01-01", periods=3, freq="A"),
        Date2=pd.period_range("2000-01-01", periods=3),
        Date3=pd.date_range("2000-01-01", periods=3),
    )
).set_index(["A", "Date1"])

def periods_to_timestamps(df):
    new_df = df.copy()

    for col_name, col in df.items():
        if pd.api.types.is_period_dtype(col):
            new_df[col_name] = col.dt.to_timestamp()

    for level in range(df.index.nlevels):
        index = df.index.get_level_values(level)
        if pd.api.types.is_period_dtype(index):
            new_df.index = new_df.index.set_levels(index.to_timestamp(), level=level)

    return new_df

df2 = periods_to_timestamps(df)