Closed mcburton closed 8 years ago
So I would like to see more integration with pySpark and Jupyter Notebooks. Basically, I'd love to see a workflow that integrates warcbase with this tutorial on getting pySpark running with Jupyter.
Stay tuned! :)
On Fri, Nov 13, 2015 at 11:53 AM, mcburton notifications@github.com wrote:
So I would like to see more integration with pySpark and Jupyter Notebooks. Basically, I'd love to see a workflow that mirrors this tutorial on getting pySpark running with Jupyter https://www.dataquest.io/blog/installing-pyspark/, but also have the warcbase functions available too.
— Reply to this email directly or view it on GitHub https://github.com/lintool/warcbase/issues/161#issuecomment-156486069.
Just to give a sense of what we're doing, I created this wiki page for our workshop. Really appreciate the suggestions! // @lintool
We had some issues in the notebook with loadWarc
vs loadArc
. We should write some sample scripts with the former.
Link extraction should be up and running in the notebook too.
Also discovered that you can share notebooks in GitHub, if saved as an iPython notebook - i.e. here. Didn't save graphics though.
warcbase as Jupyter kernel (in python and or scala). http://jupyter.readthedocs.org/en/latest/subprojects.html#kernels
We had some issues in the notebook with loadWarc vs loadArc. We should write some sample scripts with the former.
If it's a bug, please open an issue.
As requested by Ian, I am opening up an issue for discussing ideas from our ad-hoc "spark warcshop" 😉