jupyter-incubator / sparkmagic

Jupyter magics and kernels for working with remote Spark clusters
Other
1.33k stars 446 forks source link

Using %%spark magic inside functions and loops #425

Open edoardovivo opened 6 years ago

edoardovivo commented 6 years ago

Hello,

Sorry for the silly question, but I can't seem to find a way to use the %%spark magic inside functions and/or loops. The use case would be like the following: I have a series of spark dataframes that I have computed during my analysis, and I want to export them all, with as few lines of code as possible and possibly in a single cell. I have a list with all the dataframe names, so by looping through the names I would like to export them all to the %%local environment.

I understand I would have to import the appropriate packages, but I honestly have no clue how to proceed. Could somebody help me out?

Thanks a lot

aggFTW commented 6 years ago

😞 This is not supported atm, since %%spark and %%sql are cell magics, which means the entire cell must be dedicated to spark usage, and either %%spark or %%sql must be used to transfer the dataframes to local context. I understand how cumbersome this may be...

If you wanted to, you could implement a line magic for spark and sql, and contribute it back though 😄 How does that sound?

edoardovivo commented 6 years ago

Well, it would be an honor for me to contribute to such a cool project! I'll take a look at it during these holidays ;) Cheers!

edoardovivo commented 6 years ago

So, I created a "%collect" line magic that seems to do the job, and that we could possibly extend to address #418. How should I proceed? Could you give me permission to upload a branch? I am gonna need some help with creating the tests too. Cheers

aggFTW commented 6 years ago

You can fork the project, create a branch there, and submit a pull request. I can point you to our test files from there.

Nice to see! 😄

edoardovivo commented 6 years ago

Got it, thanks, I will as soon as I can!