Could not retrieve variable df1 from the Jupyter extension. Please file an issue on the Data Wrangler GitHub repository.

IwonaZwierzynska commented 3 months ago

Environment data

VS Code version: 1.92.2
Data Wrangler Extension version (available under the Extensions sidebar): v1.7.2 (pre-release)
Jupyter Extension version (available under the Extensions sidebar): v2024.8.2024080201 (pre-release)
Python Extension version (available under the Extensions sidebar): v2024.12.3
OS (Windows | Mac | Linux distro) and version: Windows 10 Enterprise
Pandas version: -
Python and/or Anaconda version: Python 3.11 (64 bits)
Type of virtual environment used (N/A | venv | virtualenv | conda | ...): venv

Expected behaviour

Displaying PySpark DataFrame in a form of table

Actual behaviour

Error: Could not retrieve variable df1 from the Jupyter extension. Please file an issue on the Data Wrangler GitHub repository. by test debugging

The same error occurs, when I would like to open DataFrame from Jupyter Notebook (from variables section)

Steps to reproduce:

Trying to open "View Value in Data Viewer" with context menu on df1 (PySpark DataFrame) while debugging tests with pytest

Logs

Output for Jupyter in the Output panel (View→Output, change the drop-down the upper-right of the Output panel to Jupyter)

``` XXX ```

Details

• How frequently is the issue happening for you? Is it all the time or only in some scenarios?

Always

• The type of data you are launching (e.g. large Pandas DataFrame with strings and numbers, etc.)

PySpark DataFrame with strings and numbers

• Are you running into the same issue even if it is something simple like the following?

I am not able to open only DataFrames from PySpark, lists (as an example) work correctly

• Any errors or abnormal messages in the developer console logs that seem related?

• Do you have multiple Python environments? If so, could you please check if the issue is occurring in other environments as well? You can change it by clicking the environment selector in the bottom right: image.png (view on web)

No.

• Lastly, are you using the interactive window debugger or the default debugpy? ie. what button or command do you press to start debugging?

It does not matter, if it is in an interactive Window, JupyterNotebook or directly in VS I am not able to display DataFrames.

Thank you very much for your help in advance.

Best regards, Iwona

pwang347 commented 3 months ago

Hi @IwonaZwierzynska, thank you for providing the extra details here! Seems like this is actually a duplicate of https://github.com/microsoft/vscode-data-wrangler/issues/255.

For some more context, we don't currently support loading PySpark variables but the Jupyter launch button shows it as something that can be launched because the type name happens also to be "DataFrame". We plan to both make the error message more clear as well as investigate the feasibility of PySpark support here.

For now, my recommendation is to convert the Spark DataFrame to Pandas using pdf1 = df1.toPandas() and view it that way. Hope this helps!

NiKoenig commented 3 months ago

Hi @pwang347,

I get the exact same error message as @IwonaZwierzynska. In my case, it does not seem to be caused by PySpark not being supported, since my dataframe already is a pandas dataframe. I get the error message in the Interactive Mode.

pwang347 commented 2 months ago

Hi @NiKoenig, thank you for letting me know.

Could you please try to reproduce the issue again with the developer console open and check to see if there are any related error messages?

You can open the developer console as follows:

Thanks!

NiKoenig commented 2 months ago

Hi @pwang347

I get 4 error messages in the Toggle Developer Console (the first two are very long, sorry): grafik grafik grafik grafik grafik grafik

Thank you!

IwonaZwierzynska commented 2 months ago

Thank you very much for your response :-)!

pwang347 commented 2 months ago

@NiKoenig seems like you are running into the same issue here: https://github.com/microsoft/vscode-data-wrangler/issues/270 (also see https://github.com/microsoft/vscode-jupyter/issues/15969)

It seems like the Jupyter kernel API is somehow not allowing us to access the kernel. Do you recall accepting/rejecting a popup window asking if Data Wrangler should be allowed access to the kernel?

NiKoenig commented 2 months ago

Hi @pwang347 this is exactly the problem I was having, thank you for pointing out this issue to me! :) I didn't get any popup window asking if Data Wrangler should be allowed access to the kernel. Is there a way to grant access now, e.g., in the settings? If not, I will just follow the discussion on the Jupyter side and hope that they find a solution there.

jjbochard commented 2 months ago

I just do this and it works for me https://github.com/microsoft/vscode-data-wrangler/issues/270#issuecomment-2324498045

theice123 commented 3 weeks ago

I have the same problem however non of the fixes i found worked. Everything is similar to this and the api enable in 270 did not work.

here is the error log:

pwang347 commented 3 weeks ago

Hi @theice123, does this issue reproduce on a new Python file like the following?

li = [1,2,3]
print(li) # <- breakpoint here and launch `li` from debugger

If the above does not work, could you also check the following:

Does using a different Python env (if you have one) resolve the issue?
Does the same code in a Jupyter notebook (IPYNB) file work for you?

Thanks!

microsoft / vscode-data-wrangler