hershaw / data-science-101

Do some data analysis and build a predictive model
MIT License
6 stars 6 forks source link

Adding support for IRkernel #3

Open augustoamerico opened 7 years ago

augustoamerico commented 7 years ago

Hi,

I'm trying to add a provision to the vagrant file which adds the IRkernel support in Jupyter.

In the provision I've added a line: echo "install.packages(c('repr', 'IRdisplay', 'evaluate', 'crayon', 'pbdZMQ', 'devtools', 'uuid', 'digest'), repos='http://cran.us.r-project.org', lib='/home/vagrant/R/lib')" | R --save which, according to that command's output, everything is installed correctly.

then, the next line in the same config.vm.provision is: echo "devtools::install_github('IRkernel/IRkernel')" | R --save where I'm using the devtools package which was installed in the previous command line, but I get an error stating that devtools is not installed then, just for debug this, I did a vagrant ssh and init an R session and, to my surprise, the devtools package was installed and the `devtools::install_github('IRkernel/IRkernel')* went smooth but the line in the provision always gives me an error

I feel like I'm missing something, but I can't grasp what

Vagrantfile: https://github.com/augustoamerico/data-science-101/blob/master/Vagrantfile Vagrant up log: https://github.com/augustoamerico/data-science-101/blob/master/vagrant_up_output.log

hershaw commented 7 years ago

It's possible that the place you're installing the packages to isn't being found in the installation-r-irkernel provisioner.

Try putting a call to env here to see if the R_LIBS_USER and R_LIBS_SITE variables are actually there. Maybe something is going wrong with the source /home/vagrant/.bashrc command.

augustoamerico commented 7 years ago

Sorry for the late response, my master thesis have been needing a bit more of attention.

So, as you said effectively is something to do with the env. Awesome tip @hershaw ! Thanks! Putting the env command before and after the source show me no differences between calls. After this, I've come to ask help to the Big Oracle (aka Google) and I've learned that the provisions run on a non interactive mode, and the .bashrc file had a few lines handling that case - when the file is being sourced in non interactive mode, then nothing is sourced!

But after changing this, it didn't work.

After this , I've started commented parts of provision and right now I'm stucked on something that makes no sense to me so far: If I comment the content of the first provision (installation) expect the 3 first lines, the other 2 provisions works just fine and R installation + package devtools is installed.

What am I missing?

Edit: added info about what I have changed

augustoamerico commented 7 years ago

Well, after a good night of sleep and after I ran out of ideas, I though to myself "I even can't have more ideas... damn... I need more ram in my head..." Well, this thought made me realised that I maybe were through the same problem as the VM - No more ram available.

Adding more ram to the vagrant file made this work :)

hershaw commented 7 years ago

Nice work! I'll look at this a bit more tomorrow morning. Let's get this cleaned up (removing prints and whatnot) and then decide what do with the repo.

Can you please put some instructions for running it in the README? I'l like to test it out. BTW, the R code is really clean stuff. The syntax is minimal but still readable.

Any suggestions about how we should use the code you wrote?

augustoamerico commented 7 years ago

Let me see if tomorrow I can prettify the R code and the Vagrant file (I think I can get time for that, just not sure).

One major thing that I have to do is to provide a jupyter notebook file with the R code.

Regarding how to use the code, I was just thinking that there are people more comfortable with R than python, so a set of examples in R seemed like a good idea. If you guys want to support R code in the followings editions of the meetup, I would be more than glad to do so.

hershaw commented 7 years ago

okay that might be cool. would be a lot of work too though because you'd need to maintain the notebooks and READMEs. Maybe the directory structure could be

course/
  R/
      class1/
      class2/
  python/
       class1/
       class2/

What do you think?

augustoamerico commented 7 years ago

Regarding the amount of work, I agree. And honestly I don't know if our "audience sample" would benefit from it. How can we test it?

That's a clean and tidy tree, yup, I like that :)

hershaw commented 7 years ago

Good point about the "audience sample". Also, the way the repo is right now is super straightforward and adding another language into the mix would compromise that a bit.

However, I'm seeing a lot of benefit to having the same thing implemented in both R and python. People that are interested in one have a decent chance of being interested in the other. For example, although we only use python at work, lots of our clients use only R so I've been curious about it. Also, the purpose of this course is not to be a python tutorial because as it says in the meetup description, the focus is on what we are doing much more than how we are doing it.

So there could be a few ways to go with this:

Support multiple languages in the same repo

pros

cons

Support multiple languages but in different repos

pros

cons

hmmm, I'm really not sure where the right direction to go with this is. Let's see if we can get anyone else from the group to chime in.

augustoamerico commented 7 years ago

About the complexity of the vagrant file, since it is nice enough to have provisions, what could be done is something like having 3 scripts to init the vagrant vm:

  1. script for vagrant up just the python provision
  2. script for vagrant up just the r provision
  3. script for vagrant all the things

Although it will add complexity to the vagrant file, the VM will not have unnecessary installs.

Regarding the parity maintenance, I must agree with you.

I think that, for now, just for testing if having support would be useful, a Support multiple languages in the same repo approach would be a good trade-off.

Let's wait for some more input about this