jupyter-incubator / sparkmagic

Jupyter magics and kernels for working with remote Spark clusters
Other
1.32k stars 444 forks source link

issues with %%spark magic command when connected to an mpack (custom python env) #514

Open cg1008syf opened 5 years ago

cg1008syf commented 5 years ago

Hi,

I have an interesting scenario…

When not connected to an mpack, I am able to transfer a spark df from hadoop to AE project namespace using %%local command w/out any issues…

However, when connected to an mpack, I am getting the below error…

Your help on troubleshooting this issue is appreciated.

Thank you,

cg1008syf commented 5 years ago

internal_error_mpack

apetresc commented 5 years ago

Sorry, what's an mpack?

cg1008syf commented 5 years ago

hi,

just custom python environments on hadoop cluster...

for more information --> https://www.anaconda.com/self-service-open-data-science-custom-anaconda-management-packs-hortonworks-hdp/

we usually connect to it by changing the python env parameter when providing configuration for spark session...

when not connected to an mpack

cg1008syf commented 5 years ago

configs when connected to an mpack...

when connected to an mpack

ericdill commented 5 years ago

This seems like the same issue as https://github.com/jupyter-incubator/sparkmagic/issues/471. It's possibly a pandas issue as per this issue https://github.com/jupyter-incubator/sparkmagic/issues/458

cg1008syf commented 5 years ago

Hi Eric,

We tested this and found that code running in python v2 env + pyspark session works fine, but not so for python v3.

Are the magic commands (%%local, %%sql …) specific to python v2?

Thank you, Anand

From: Eric Dill notifications@github.com Sent: Wednesday, March 13, 2019 12:41 PM To: jupyter-incubator/sparkmagic sparkmagic@noreply.github.com Cc: Sandhinti, Anand (Synchrony, consultant) Anand.Sandhinti@syf.com; Author author@noreply.github.com Subject: [External] Re: [jupyter-incubator/sparkmagic] issues with %%spark magic command when connected to an mpack (custom python env) (#514)

This seems like the same issue as #471https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_jupyter-2Dincubator_sparkmagic_issues_471&d=DwMCaQ&c=i0QXx0LZaNWl3bsI0Hrdtw&r=VeXZezPmk-DgvTh3rthNQ41Wlo0g3wbXNhAFzOYCnYc&m=_qpYGdhex2ipwbD_wwnTYt5YjyYirxeNrkxt3oKIVj0&s=ptfg6vt2pgdVauNHoE8rYr6chH2Sp3kEsDVvYna-xW0&e=

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_jupyter-2Dincubator_sparkmagic_issues_514-23issuecomment-2D472530583&d=DwMCaQ&c=i0QXx0LZaNWl3bsI0Hrdtw&r=VeXZezPmk-DgvTh3rthNQ41Wlo0g3wbXNhAFzOYCnYc&m=_qpYGdhex2ipwbD_wwnTYt5YjyYirxeNrkxt3oKIVj0&s=Yaio-6BgwTEGGH4RKUst-H-orpHZlQRET7G784KSj3A&e=, or mute the threadhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AtJKYFLGjPGJtK-5FgIqLEfZbWteetAvrhks5vWThBgaJpZM4bKeJD&d=DwMCaQ&c=i0QXx0LZaNWl3bsI0Hrdtw&r=VeXZezPmk-DgvTh3rthNQ41Wlo0g3wbXNhAFzOYCnYc&m=_qpYGdhex2ipwbD_wwnTYt5YjyYirxeNrkxt3oKIVj0&s=Z14XIXTErj4fUQeHPmZn_zLPaea5WqbK9l22r_VdXbw&e=.

ericdill commented 5 years ago

The magics are not implicitly specific to py2 or py3. I suspect that what you're encountering is an encoding / decoding issue somewhere in the sparkmagic <-> Livy <-> Spark communication pathway. It looks like one side (Spark or Livy, maybe?) is encoding into byte strings (that's the b' at the start of your string in your original error report) and the other end (sparkmagic, probably) is expecting things to be in strings already

cg1008syf commented 5 years ago

Hi Eric,

Thank you for the clarification. There was a workaround suggested…

[cid:image001.png@01D4D99C.8F75E1E0]

Your thoughts on this…?

Thank you