plotly notebook plotting doesn't work when connected to SQL Aris cluster

microsoft / azuredatastudio

Azure Data Studio is a data management and development tool with connectivity to popular cloud and on-premises databases. Azure Data Studio supports Windows, macOS, and Linux, with immediate capability to connect to Azure SQL and SQL Server. Browse the extension library for more database support options including MySQL, PostgreSQL, and MongoDB.

https://learn.microsoft.com/sql/azure-data-studio

MIT License

7.5k stars 882 forks source link

plotly notebook plotting doesn't work when connected to SQL Aris cluster #6425

Closed sfweller closed 4 years ago

sfweller commented 4 years ago

Azure Data Studio Version: Version 1.9.0 Commit: 78a42e1d112ae3231777722b51eaf44f83ddbe55

Steps to Reproduce:

Install the python packages 'plotly' and 'plotly.express' on an aris cluster.
Run the following code from a notebook to produce a sample plot:

import plotly import plotly.express as px

tips = px.data.tips() fig = px.strip(tips, x = "total_bill", y = "time", orientation="h", color = "smoker") fig.show()

When attached to 'localhost' the figure gets rendered in the Notebook, if you are attached to a SQL aris cluster however, the figure is NOT rendered and you are not able to save it locally.

sfweller commented 4 years ago

You can use the following code from a notebook to install the packages on a cluster:

import subprocess

def run_cmd(args_list): print('Running system command: {0}'.format(" ".join(args_list))) proc = subprocess.Popen(args_list, stdout=subprocess.PIPE, stderr=subprocess.PIPE) (output, errors) = proc.communicate() if proc.returncode: raise RuntimeError('Error running command: %s. Return code: %d, Error: %s' % ( ' '.join(args_list), proc.returncode, errors)) return (output, errors)

output, errors = run_cmd(['pip3', 'install', 'plotly']) print("install plotly python module") print(output)

kevcunnane commented 4 years ago

So this is the same issue as for matplotlib, where they only work on local data frames. The hooks in local Python for IPython won't work when routed over the Livy job scheduler - that can only return back specific data.

To work around, you can copy the data frame to a local data frame (in this case, 'df'):

%%spark -o df
df = # copy data frame here

Then run local graphing:

%%local
import plotly
import plotly.express as px

tips = px.data.tips()
fig = px.strip(tips, x = "total_bill", y = "time", orientation="h", color = "smoker")
fig.show()

kevcunnane commented 4 years ago

I'm going to close this as an upstream issue - there's little to nothing we can do here to make this work from a code side, unfortunately. It's something we should consider doc'ing though - @ronychatterjee and @yualan could you look into this?