matthewfeickert / HEPML-env

A minimal Python3 environment for HEP machine learning with pipenv
BSD 3-Clause "New" or "Revised" License
0 stars 0 forks source link

Make the installer script more awesome #17

Open dguest opened 6 years ago

dguest commented 6 years ago

I just tried this out, I have a few comments:

While I was running this I realized that it also answers one of my earlier questions: since this puts python3 in your path by default you can do all sorts of terrible and unsafe things with your global python3 environment from then on. Yay!

matthewfeickert commented 6 years ago
  • The installer script is awesome and has super helpful prompts. That said, it should probably also have have command line options because eventually there will be a reason to script it.
  • It goes without saying that it should then have a (minimal) -h option.

Agreed. So far what I have is just an installer that does install something. To make it a nice installer is a TODO that needs to get some attention/priority.

  • You put (very helpful!) code in people's .bashrc. I mean that's cool, but to me it feels a bit violating to do that without asking first and explaining what you're doing.

Good point. I can have that be a prompt that shows the user what is going to be added and then ask them if they are okay with that.

  • This is especially true because the executing pipenv --completion takes about half a second. Unfortunately in HEP workflows quitting the terminal and reconnecting is something you have to do quite often, so things that slow down the initial setup are annoying. Honestly I usually avoid putting python calls in my .bashrc because of this. Maybe add an option to disable completion?

Valid point, and I'll adjust this. I will take a moment to rant that people need to stop with this damn editing their code directly on the remote nonsense and learn that SFTP has existed since 2001 and any reasonable editor has a plugin for it.

-If you run again, does it paste it in again? Maybe put some safety there (i.e. make it quit or something). I would stop short of removing things from .bashrc in an automatic way, though, because it seems pretty dangerous.

No. The very first thing that append_to_bashrc does is check if it has already put things in .bashrc before. However, I should check against the string "# added by HEPML environment installer" instead of what I currently have.

  • If you recommend the work area, that should be the default path. I also think it might be nice to put it in a subdirectory, i.e. ~/work/hepml/

So while I originally agreed with this, it would mean that then people would have path/work/hepml/Python-3.6.6 and that seems a bit strange as the Python3 install isn't really by itself relevant to the directory name. path/work/hepml/ seems like that would be a directory that a user would make themselves to start a project, kinda like what is outlines in the README in PR #12. Thoughts?

dantrim commented 6 years ago
  1. I just attempted running this on a CentOS release 6.10 machine and I believe it failed. It seemed to fail pretty early on in the execution, but the subsequent steps in the installer didn't seem to be aware of that fact and continued on -- the script making it all the way to completion. I think each of the steps are modular enough to be able to check the return of them, or something like that.

  2. Looking here it looks like CentOs7 is a requirement, but this is not stated and the script does not fail otherwise (unless you are on lxplus). It is not clear that these should/need to be requirements? Why are they?

  3. I also notice that you only give the user the ability to answer [Y/n] and if you say n then you end up in an infinite loop. I think it would be nice to add a quit/exit option: [Y/n/q] or something so that the user can decide to exit gracefully. I would have liked to exit at one point, for example, when it asks if things look OK: if you say no, it reverts to the previous step it appears.

  4. On the Centos6 machine and on my personal Macbook the fs command is unknown. You attempt to list the quota here but I am not sure if this is necessary. Does the user or your script need to worry about the quota on their machine (i.e. is this script likely to push them over a limit?).

image

  1. The --install-dir option does not seem to be propagated. If I apply it and then list the defaults, I see that it is there:  image

however when I begin the execution with this options, e.g. --install-dir ${PWD} it gives:

image

Note that it is also unable to expand the ${PWD} and gives the ERROR.

  1. I get a complete failure of pip to install the packages, which looks to be a permissions issue: image

Note that it makes it to the end of the script.

  1. At the end of this, python3 is also not symlinked properly to the newly installed python3.6 (which is there): image

Following along with what @dguest has mentioned regarding the bashrc and writing to it (and checking if it actually needs to). Is adding this all to the your ~/.bashrc really necessary? This implies a lot about the user's intention, imo. What if they do not wish to have this version of python3 or whatever every time they login? TBH I would opt for just having a setup script that, all that it does, is setup the paths/etc as you have in the bashrc writing. This script can then be added to the PATH for future use (i.e. everytime someone is ready to setup their ML environment).

matthewfeickert commented 6 years ago

@dantrim thanks very much for taking the time to try this out and to give such detailed feedback. :+1: You've done a good job of finding some edge cases. :wink: I've responded to most of the things below, but further feedback from you is welcome.

  1. I just attempted running this on a CentOS release 6.10 machine and I believe it failed. It seemed to fail pretty early on in the execution, but the subsequent steps in the installer didn't seem to be aware of that fact and continued on -- the script making it all the way to completion. I think each of the steps are modular enough to be able to check the return of them, or something like that.

Yeah, I am checking for errors with set -eu but I haven't implemented expected return value checks. So that should get added in.

  1. Looking here it looks like CentOs7 is a requirement, but this is not stated and the script does not fail otherwise (unless you are on lxplus). It is not clear that these should/need to be requirements? Why are they?

CentOS 7 is not a requirement, unless you're on CERN's LXPLUS cluster. As @dguest, you, and I discussed this is to make life easy with regards to issues with OpenSSL, but if there is some strong motivation to have users be on the SL6 nodes of LXPLUS then this can get brought up in another Issues and addressed in a different PR.

  1. I also notice that you only give the user the ability to answer [Y/n] and if you say n then you end up in an infinite loop. I think it would be nice to add a quit/exit option: [Y/n/q] or something so that the user can decide to exit gracefully. I would have liked to exit at one point, for example, when it asks if things look OK: if you say no, it reverts to the previous step it appears.

This would be a simple feature to implement, so consider it added. :+1: You aren't actually in an infinite loop though: The installer assumes that you actually wanted to install something, but as soon as you give incorrect input (such as an empty string for a path) then it will warn of invalid input and quit.

  1. On the Centos6 machine and on my personal Macbook the fs command is unknown. You attempt to list the quota here but I am not sure if this is necessary. Does the user or your script need to worry about the quota on their machine (i.e. is this script likely to push them over a limit?).

image

fs lq is calling the AFS command fs_listquota. This isn't POSIX, and so should be guarded against. I'll have to check for something similar that I can make in POSIX for a future PR. I think it is useful to give a heads up as if you're on a login node with limited space this could be an issue as the binaries that are associated with some of the packages that pipenv is going to eventually install will take up over 1GB in the cache.

  1. The --install-dir option does not seem to be propagated. If I apply it and then list the defaults, I see that it is there: > image

however when I begin the execution with this options, e.g. --install-dir ${PWD} it gives:

image

Note that it is also unable to expand the ${PWD} and gives the ERROR.

This is a legit bug. Thanks for finding it and reporting!

  1. I get a complete failure of pip to install the packages, which looks to be a permissions issue: image

Note that it makes it to the end of the script.

This is related to the OpenSSL issue we discussed. This can get checked for properly earlier and then also implement expected return values for the functions that perform the install steps.

  1. At the end of this, python3 is also not symlinked properly to the newly installed python3.6 (which is there): image

It is properly symlinked. It just isn't known to $PATH as you haven't prepended the Python install directory yet (which the stuff added to ~/.bashrc takes care of).

Following along with what @dguest has mentioned regarding the bashrc and writing to it (and checking if it actually needs to). Is adding this all to the your ~/.bashrc really necessary? This implies a lot about the user's intention, imo. What if they do not wish to have this version of python3 or whatever every time they login? TBH I would opt for just having a setup script that, all that it does, is setup the paths/etc as you have in the bashrc writing. This script can then be added to the PATH for future use (i.e. everytime someone is ready to setup their ML environment).

Fair. I can add a user prompt step that asks the user if they want it appended to ~/.bashrc or written to a setup_HEPML-env.sh script in the install directory. I'll also have to flag this choice in the install script so upon successful completion it prompts the user to source the correct file to get up and running.

matthewfeickert commented 6 years ago

@dantrim To help understand your uses cases better can you tell me more information on the UCI cluster you're using? Maybe uname -a at least? I hope that Issue #15 takes care of most of it though.

dantrim commented 6 years ago

@matthewfeickert

$uname -a
Linux uclhc-1.ps.uci.edu 2.6.32-696.23.1.el6.x86_64 #1 SMP Tue Mar 13 22:44:18 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
$cat /etc/centos-release
CentOS release 6.10 (Final)
$gcc -v
Using built-in specs.
Target: x86_64-redhat-linux
Configured with: ../configure --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=http://bugzilla.redhat.com/bugzilla --enable-bootstrap --enable-shared --enable-threads=posix --enable-checking=release --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-gnu-unique-object --enable-languages=c,c++,objc,obj-c++,java,fortran,ada --enable-java-awt=gtk --disable-dssi --with-java-home=/usr/lib/jvm/java-1.5.0-gcj-1.5.0.0/jre --enable-libgcj-multifile --enable-java-maintainer-mode --with-ecj-jar=/usr/share/java/eclipse-ecj.jar --disable-libjava-multilib --with-ppl --with-cloog --with-tune=generic --with-arch_32=i686 --build=x86_64-redhat-linux
Thread model: posix
gcc version 4.4.7 20120313 (Red Hat 4.4.7-23) (GCC)
matthewfeickert commented 6 years ago

@dantrim CentOS's Dockerhub only has releases for CentOS 6 up through 6.9

# cat /etc/centos-release
CentOS release 6.9 (Final)

and those don't have /cvmfs/ on them and it isn't clear to me which Docker image that has cvmfs in the title is a good one. So it might be a bit hard to reproduce your environment for testing if I can't access the cluster. Have any thoughts?

dantrim commented 6 years ago

eyo @matthewfeickert

I haven't experience with these ones exactly, but there appears to be both docker and singularity images on cvmfs that look to be for centos6. as they are on cvmfs I assume that they are representative of that OS' setup on a typical CERN machine/cluster:

docker: /cvmfs/atlas.cern.ch/repo/images/docker

singularity: /cvmfs/atlas.cern.ch/repo/images/singularity

these may be useful, not sure though.

matthewfeickert commented 6 years ago

@dguest @dantrim @alexarmstrongvi I'm not going to get the Cent OS 6 issue solved anytime soon, but can I get some extra UX feedback from you by having you do the following?

  1. On a local machine do

    docker pull hepsw/cvmfs-base-cc7
    docker run --privileged --rm --name UX-testing -it hepsw/cvmfs-base-cc7:latest /bin/bash
  2. On a local machine in a different terminal window

    git clone git@github.com:matthewfeickert/HEPML-env.git && cd HEPML-env
    git checkout origin/feature/add-CLI-flags-to-installer -b UX-test
    docker cp scripts/install.sh UX-testing:/root/
  3. In the Docker container interact with and run the installer

    bash install.sh -h
    bash install.sh -d
    bash install.sh
  4. Install the HEPML environment in the Docker container

Now that git is installed (normally this would be a true prerequisite)

git clone https://github.com/matthewfeickert/HEPML-env.git && cd HEPML-env
pipenv install --dev && pipenv run pytest
  1. Enter the environment
pipenv shell

Enter the Python REPL, import some of your favorite ML tools and play around as wanted.

  1. Let me know your feedback on the UX
dguest commented 6 years ago

Hey, some notes about trying this:

matthewfeickert commented 6 years ago

@dguest Thanks for the feedback.

I had issues connecting to /cvmfs/ from my mac

Can you paste your output when the hepsw/cvmfs-base-cc7 starts up? That is, what is the printed to screen when the following is run

docker run --privileged --rm -it hepsw/cvmfs-base-cc7 /bin/bash

My assumption is that FUSE is failing to mount properly.