merely-useful / py-rse

Research Software Engineering with Python course material
http://third-bit.com/py-rse/
Other
247 stars 63 forks source link

What tools to use for practical exercises in the novice course? #9

Closed gvwilson closed 5 years ago

gvwilson commented 5 years ago

The advanced course should use authentic tools, which I believe means RStudio for R and Jupyter Notebooks for Python. I think the choice of tools for novices is less clear:

We should agree on tooling before work starts on lessons.

lwjohnst86 commented 5 years ago

My opinion (for the R side of things) is to teach RStudio to novices, because:

ChristinaLK commented 5 years ago

What are some of the alternatives to Jupyter?

joelostblom commented 5 years ago

I have mostly (only?) had positive experiences teaching data science related topics to novices using Jupyter Notebooks. To be fair, I have not tried teaching Python via other environments, but I have some experience teaching R via RStudio and my qualitative assessment comparing the two is that learners have not been struggling with either programming environment.

I think the addition of the Object Inspector in JupyterLab and the ability to have it open on the side and automatically update with function docstrings as you type is a big win! I use this myself and I always recommend it to people new to JupyterLab and coding in general. There are also documentation links in the help menu, a file browser, a command palette, the ability to view output side by side, and consoles/terminals that can be laid out to replicate and IDE layout if this is desired. Personally, I think the simplicity of the default layout without the many panels of an IDE is a strength when teaching, since it makes the interface less complicated and distracting. When I teach or do work using an IDE, I tend to maximize the active panel most of the time and only show other panels if they are specifically needed, which is similar to how I teach and work in JupyterLab.

Alternatives to Jupyter Notebooks in my mind are IDEs such as PyCharm and Spyder (Spyder has limited support for running notebooks, PyCharm has support in the commercial version only, atom could also be an alternative with Hydrogen and nteract). I learned Python myself using Spyder and it facilitated my transition from Matlab (6 months experience so not a complete novice at that point), since it has a similar IDE layout. When switching to Jupyter Notebooks, I remember that I initially missed the workspace/variable explorer, but that might just have been because I was used to having it in Matlab and Spyder, and I quickly transitioned to checking variable values programatically. Nowadays, there is an actively developed variable explorer JupyterLab extension. An interactive debugger is also missing but in my opinion this is more important for computer science than it is for exploratory data analysis (and again, there are extensions for this, only for the old notebook so far).

Lastly, I want to say that I think there is a lot of value in staying in the same developer environment and avoiding confusion and relearning when switching to a new one since this can be discouraging. So unless there are critical specific concerns that we think will notably limit the novice learning experience, I am in favor for teaching Python in JupyterLab from the get go. Having said that, I am interested to learn from the vast teaching experience in this group, so @gvwilson are there specific issues you have been running into when teaching with Jupyter Notebooks that would help me better understand its limitations? Similarly, @lwjohnst86, in addition to your points above (which I all agree with), are there any specific concerns you have experienced that are not related to RStudio's tighter integration with R and instead related to teaching in the Jupyter Notebook format vs the RStudio IDE format (where the R markdown notebook is in one of many panels)? Do you also fullscreen the notebook panel when you teach and/or work in RStudio?

lwjohnst86 commented 5 years ago

@joelostblom I personally have never taught in Jupyter but I've been in several workshops where it was taught, so my experience with the Jupyter vs RStudio "more effective teaching tool" is limited to that perspective. And also that most of my experience comes from teaching learners from a mainly biomedical background. From that perspective, there are several "computer sciency" things that Jupyter does that makes sense in a "traditional programming" standpoint but from a purely data analysis (mostly statistical) standpoint don't make sense. It's been a while so maybe this has changed, but an example would be that plots don't immediately show in a Jupyter Notebook unless using commands like %matplotlib inline.

As for the "default layout" of just the notebook, I always only show either just the RStudio notebook panel or show the script panel plus the console panel. One minor thing I'm not super enthusiastic about when using notebooks for teaching is that the screen size is only so big and when running code with the output below and that sometimes a few learners are still writing the previous code when I start writing the next code. But very minor.

Other than that, I think they are both great tools for teaching purposes.

joelostblom commented 5 years ago

Thanks @lwjohnst86! Sounds like we're using it similarly. I also share your annoyance with not being able to keep the last code cell and the complete output in view at a zoom level that makes it visible for everyone. (%matplotlib inline is no longer needed, but I tend to need to execute the first code plot cells twice and then plots show up fine)

mbonsma commented 5 years ago

I've had success teaching Python with Jupyter, but I believe that Jupyter is much harder to use if you're installing and trying it yourself, alone, without instructors and helpers in the room. If novices are working through the course on their own, Jupyter is probably too tricky to use for exercises. Is the target audience mostly learners in a class setting, or people learning on their own?

Could we do a more hub-like thing? I haven't personally used Binder, but would it be possible to do something like what Ines Montani has done with her spacy course (pointed to me by @joelostblom and mentioned in #3) where small code snippets can be run without installing something? Or maybe even simpler, just providing a Jupyter instance that learners don't have to install or run locally. Again, I haven't used Jupyter Hub myself.

Another option along the same lines is Codepen.

gvwilson commented 5 years ago

Closed in #7.

joelostblom commented 5 years ago

TL;DR I think it is important to teach a development environment to novices. We don't need to get hung up on making the perfect decision, but instead focus on providing an optimally helpful experience for new learners, which to me includes recommendations on which tools to use.

@gvwilson My understanding from our video meeting was that we would keep the Python material tool neutral for the time being and decide on a suitable developer environment later. Did I misunderstand this?

For novices, I am strongly in favor for teaching the specifics of a particular development environment. I don't think we need to choose the perfect tool that everyone agrees is the best for everything, but choosing one that is great and likely to be used by students in the future will be much more helpful than not recommending one at all. Especially for novices, I think it is essential for them to become comfortable in the specific development environment they are using and not just with the programming language itself.

Many conveniences are introduced by the graphical development environments and this would be lost if we focus on teaching tool neutral Python. These conveniences are especially important for novices (e.g. viewing the function help automatically, navigating a file tree, or displaying plots in a panel/notebook). I also don't think there is any harm by pointing to a specific development environment, as long as the choice is a well recognized and useful tool that learners are likely to encounter later in their career (which is the case for both PyCharm and JupyterLab).

Regarding the specific points brought up at the meeting about continuity (which I understood as teaching from the interpreter to the notebook in the same developer environment), JupyterLab actually does meet this criteria. From the launcher page, one can chose to open a console, text file, or notebook. The scripting support is not as elaborate as in IDEs such as PyCharm (e.g. you need to write %run script_name.py instead of clicking a button), but for teaching the idea of using scripts for automation and the basic syntax, I think it is sufficient (e.g. it includes syntax highlighting, automatic indentation, and tab completion).

Some the points above might be easier to assess as the material is being developed and we understand how we want the chapters to evolve and which are the personas to consider, but overall I think it is worthwhile reopening this issue or creating a new one for Python only.

gvwilson commented 5 years ago

Happy to see this one re-opened - do you have a proposal for a specific tool (or tools) to use for teaching Python?

joelostblom commented 5 years ago

Thanks Greg, I think JupyterLab is a great tool for teaching Python to novices, especially if we focus on literate programming and exploratory data analysis. If there is a significant amount of scripting material planned, then PyCharm could be more suitable.

I am more in favor for JupyterLab because my experience is that students learn well in this environment and it is what I see used the most for EDA (again, I have not taught with PyCharm, but I have used it myself and think the interface is more complex which can be a downside for novices). Both are very popular and students will benefit from learning either in my opinion (I tried to find some numbers to back up my popularity notion, this recent online poll supports it (but of course comes with all the caveats and biases of a voluntary online poll at a specific site)).

ChristinaLK commented 5 years ago

I'm definitely environment agnostic, but I like @joelostblom's (implicit) point here that the IDE is one of the things we can/should teach and we should communicate to people (maybe all along, but especially towards the end of an intro course) that "this is what an IDE can provide for you on top of the language".

I also agree re: convenience. One way to get at this might be which things we think are most helpful (for novices) and pick an IDE that has them. For example, I find that the Rstudio feature where you can click on a dataframe in the environment pane and it opens up is SUPER userful, because just printing the table in a console output (the other option) is far less readable.

brandeism commented 5 years ago

I think we may need to consider who is our average "novice" to Python, e.g., which persona do/should we have in mind. Someone who has some coding/scripting experience in another language is a very different type of novice than someone who has not ever seen a line of code. We can also become clearer on what we want folks to learn. I agree with @joelostblom's literate programming sentiments. Let's focus on the data science and not make this too 'computer sciencey'.

Regarding Spyder vs PyCharm as option, I'm more familiar with using Spyder, but use it for dev tasks, not instruction. I could imagine though for a non-STEM Python newcomer that Spyder may be too overwhelming given its multiple panels. Has anyone taught/tried teaching using Spyder or PyCharm? Open to hearing the experience.

I like Jupyter Notebooks because it keeps the code in manageable nuggets (single panel) for learners, but debugging is simply messy as several have said. Assuming our initial content is focused on computational/algorithmic thinking, we may need to add special tips for debugging on Jupyter.

joelostblom commented 5 years ago

At our meeting this morning we decided to proceed with Spyder as the recommended IDE, meeting notes are in #53. I think this Spyder's notion of code cells will facilitate a transition to notebooks at later stages. I also volunteer to passionately write the Jupyter Notebook appendix...

@elliewix @mbonsma @gvwilson In addition to the alternatives we discussed today, I just found out about the jupyter support in vscode which sounds worthwhile to keep an eye on in the future. It is reminiscent of atom'ts hydrogen, but works directly with .ipynb files and could provide both a debugger and a seamless git workflow for notebooks (by converting in and out of .py automatically). Since vscode ships with anaconda, this could become an interesting default EDA IDE when it matures.

ChristinaLK commented 5 years ago

Thanks for the update @joelostblom!

joelostblom commented 5 years ago

Since I will use info from this discussion as basis for writing the supplementary notebook chapter, I will keep adding details here (at low frequency so there is no spamming of notifications). Maybe they will be useful for someone else's workflow as well!

Debugging

The IPython debugger can be used from within the Jupyter notebook with %debug. This works well, but with some annoyances, such as that up arrow does not recall the last executed line, you need to retype.

The birdseye visual debugger is supposed to work with JupyterLab (general debugger discussion also in this link) and not just the classic notebook.

Profiling

Snakeviz works great for this a cell magic function (might need to append -t for opening in a new tab, depends on JL version).

Notebook version control

Put the following in the last cell of the notebook and diff on the resulting .py.

!jupyter-nbconvert notebook-name.ipynb --to python --PythonExporter.exclude_input_prompt=True

This workflow can be automated and extended by using the jupytext plugin, which allows opening text files as notebook in JupyterLab and the classic notebook (by converting them to notebooks on the fly). This means that the .ipynb is no longer strictly needed to collaborate on a notebook, instead this file can be treated as the output file and can be shared only when one needs to see outputs (similar to how .html is treated when working with .Rmd, of course a .html could be shared instead with Python as well).

Notably, jupytext works with multiple text formats, including .Rmd and .md, so now you can write all your documents in markdown source format and still open them in your (or at least mine =)) favourite editor! It also supports .py files with cell code chunk (#%%) as the source. I have tried this out a bit and it is working seamlessly and setup is straightforward.

There is also the jupyterlab git extension and the jupyterlab nbdime extension (might require manual install. Together these enable easy viewing of diffs, staging, and committing from within the notebook.

How to cover literate programming and publishing from spyder? These concepts lends themself better to the notebook. Publishing scholarly writing is still a bit lacking compared to using pandoc or rmarkdown, jupytext with rmarkdown and rstudio is a possible way here if we want to stay GUI driven.