paultopia / quantitative-methods-for-lawyers

Introduction to Quantitative and Computational Legal Reasoning
Other
12 stars 2 forks source link

Programming Language and Platform #1

Open paultopia opened 6 years ago

paultopia commented 6 years ago

My current plan is to teach this course using Python, and to require students to install the Anaconda distribution (using the most recent version of Python 3 at the time), with the idea that most of the work will be done in Jupyter notebooks.

However, there are a few potential problems with this plan:

  1. The perennial problem with library glitches, version incompatibilities, etc. The wonderful Vicki Boykis just wrote a wonderful blog post about all this mess. This can be a particular nightmare with the data stack, and it could waste a ton of time debugging and turn students off. One solution could be heavy use of virtual environments, but working with virtualenvs in jupyter notebook is a bit awkward.

  2. Windows. I don't use it, and I have no clue how to debug installations on it. I haven't touched a dos command line for well over a decade. I suppose I could press a poor beleaguered university IT person into service, but that seems a little much.

Here are some alternatives:

Google Collaboratory

Pros: no setup, installation, or anything nasty like that.

Cons: Google. Who even knows if this will exist in a year given how often they kill projects. Essentially no tutorial material or UI afffordances, kind of ugly, have to figure out how to get code and data in and out. Requires constant internet access.

PythonAnywhere

Pros: established company with a good reputation, not going anywhere, beginner-friendly.

Cons: have to pay for anything that supporters Jupyter notebooks.

Roll a class server

Pros: complete control, no installation issues whatsoever, like PythonAnywhere but without charging the students.

Cons: way too much work.

The Harvard CS50 approach: create a class virtual machine

Pros: complete control over installation, can just distribute consistent version of every library up front.

Cons: a lot of work, students still might have installation problems with whatever virtualization option is chosen (vagrant? docker?), some students might not have machines with enough horsepower for virtual machines on top of everything else.

Just give up and use R

Pros: Better control over package version stuff, generally fewer glitches I think. Generally easier ecosystem for the kind of basic stats stuff students will be learning. RStudio is probably nicer than Jupyter Notebook on the whole.

Cons: R. I want students to see a general purpose, mainstream, programming language and get some idea of the things they can do with code as well as the stuff they can do with data. R is not even remotely suitable for general-purpose programming, as a language it has way too many warts. (1-based indexing? the evil that is stringsAsFactors? Ugh. Also, the library ecosystem issue.

What are your thoughts, world?

warrenagin commented 6 years ago

I had so many problems doing Anaconda installs on my computer (windows) that I can't imagine working through it in a class environment. Most of my class was on Apple.

I'd go talk to your school's engineering or comp sci department to see how they deal with the issue of setting up environments. Maybe the B-school (since they have to work with non-technical students) They might be of some help or have a stock solution you can use.

I used excel and on-line tools (Wolfram, online calculators) in part because I was too new to python and R to be comfortable teaching them, but also because I was concerned about the learning curve for students who have never coded before. I hear Smartsheets is a useful product too. You can pick up the skills you would need in about one or two hours. Sort & Filter, Pivot Table, Solver, and the Data Analysis plug-in. Plus knowing that you can drag cell entries across ranges.

paultopia commented 6 years ago

Thanks! I'm really reluctant to focus on non-code-driven tools, because part of the purpose is to give students a sense of what it feels like to program, see if that kind of approach to practice is attractive to them. But the Anaconda windows thing, yeah, that's painful.

Maybe this is more reason to go with Collaboratory or something like that. We also have a nice informatics cluster here at UI... hmm... definitely should talk to them.

Here's another mad idea: javascript??? It wouldn't actually be all that difficult to set up a basic environment allowing students to just execute code in-browser. And that would just eliminate almost all installation woes, at the price of dealing with the warts of javascript and browser APIs.

warrenagin commented 6 years ago

I wouldn't know. Never learned JavaScript.

On Mon, Apr 16, 2018, 11:57 AM Paul Gowder notifications@github.com wrote:

Thanks! I'm really reluctant to focus on non-code-driven tools, because part of the purpose is to give students a sense of what it feels like to program, see if that kind of approach to practice is attractive to them. But the Anaconda windows thing, yeah, that's painful.

Maybe this is more reason to go with Collaboratory or something like that. We also have a nice informatics cluster here at UI... hmm... definitely should talk to them.

Here's another mad idea: javascript??? It wouldn't actually be all that difficult to set up a basic environment allowing students to just execute code in-browser. And that would just eliminate almost all installation woes, at the price of dealing with the warts of javascript and browser APIs.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/paultopia/quantitative-methods-for-lawyers/issues/1#issuecomment-381655695, or mute the thread https://github.com/notifications/unsubscribe-auth/AQTLwHTpd3ZYcg9daXGb-bxlQ0KFyHISks5tpL91gaJpZM4TVxCH .

paultopia commented 6 years ago

Just noting here for further discussion and interesting suggestion over email by Warren to teach Tableau, which apparently has R and Python integrations.

Another thought I had for the datavis side is that plotly is particularly userfriendly, and has python and r bindings.

TrevorAustin commented 6 years ago

You should definitely run this in real Jupyter notebooks - it would be a disservice to your students to expose them to R ;)

I researched this while we were working on adding on-demand Jupyter notebooks at Civis. There are a bunch of paid versions of that service now, some of which may be free for educational uses. Collaboratory actually seems OK - you get real .ipynb files out, and it looks like you can even integrate it with git.

You may want JupyterHub though. It was made for just this purpose: hosting notebooks for users off a shared image so they don't go through dependency heck. See also https://developer.rackspace.com/blog/how-did-we-serve-more-than-20000-ipython-notebooks-for-nature/.

paultopia commented 6 years ago

Ooh! JupyterHub looks great! I think then current leading candidates are that and collaboratory.

warrenagin commented 6 years ago

Just to mention that Tableau's R and Python integrations provide limited code functionality - also, they work off the local installed R and Python environments so you don't get around the local install issue.

When I tried to access Colaboratory Google put me on a wait list for access - and my firm is a G-suite customer.

paultopia commented 6 years ago

Logging another suggestion---on twitter someone suggested Microsoft notebooks https://notebooks.azure.com/faq --- it looks like they're free, and at least in my short-term case, the University of Iowa is a Microsoft shop, so that might mean the IT people here will be willing to support it...

warrenagin commented 6 years ago

Notebooks.azure.com is so easy even I could get it going in a few minutes. You can include them inside a github style library, and import back and forth. Some serious limits on data sizes it seems - 100mb for uploads, but that should be enough for teaching purposes.

Looks perfect to me.

paultopia commented 5 years ago

this deck makes an interesting argument about notebooks being extra hard for beginners. maybe worth introducing with a repl and text editor first? though then the installation/dependency hell problems come back... https://docs.google.com/presentation/d/1n2RlMdmv1p25Xy5thJUhkKGvjtV-dkAIsUXP-AL4ffI/mobilepresent?slide=id.g362da58057_0_1

warrenagin commented 5 years ago

It's a cute slide deck, I'll give you that. I love that you can build a tutorial within the code to guide beginners through it. The Moocs I learned on used that technique, and Colarusso's machine learning notebooks are terrific.

On Sat, Aug 25, 2018 at 3:44 PM Paul Gowder notifications@github.com wrote:

this deck makes an interesting argument about notebooks being extra hard for beginners. maybe worth introducing with a repl and text editor first? though then the installation/dependency hell problems come back...

https://docs.google.com/presentation/d/1n2RlMdmv1p25Xy5thJUhkKGvjtV-dkAIsUXP-AL4ffI/mobilepresent?slide=id.g362da58057_0_1

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/paultopia/quantitative-methods-for-lawyers/issues/1#issuecomment-415992198, or mute the thread https://github.com/notifications/unsubscribe-auth/AQTLwCKUHn7q9uS0sJzZKK6jBPOqUJaSks5uUakYgaJpZM4TVxCH .