justinalsing / dlmmc

Dynamical linear modeling (DLM) regression code for analysis of atmospheric time-series data.
MIT License
23 stars 4 forks source link

OS versions: Conda and/or Pip #8

Closed taqtiqa-mark closed 5 years ago

taqtiqa-mark commented 5 years ago

Blocking openjournals/joss-reviews/issues/1157

I think @Chilipp's comment is reasonable:

As a user, I do not want to test whether it works with pip or not, I simply want to install it and it should work

There is an apparent issue with the install instructions working with conda for some users (see #3) and sometimes with pip with others (see #6). Also see here:

One point of contention though: I used pip to install dependencies (which works fine for me and other colleagues who use pystan extensively). However, @Chilipp did not get a working pystan set up when he did pip install, but he did with conda, so suggested that I change the install instructions to conda (which I duly did). But I don't use conda personally or have it set up

There are three OS that the application claims to work on:

  1. Windows (all versions?),
  2. MacOS (all versions?),
  3. Linux (all distributions and all versions?) a. Draft Ubuntu 16.04.6 instructions are in #6

Yet there is ambiguity about which installation method should be used conda or pip: Please clarify

justinalsing commented 5 years ago

Since pip install pystan is not as stable as using conda install pystan, as noted on the pystan readthedocs (and also pointed out by @Chilipp as you note above), my install instructions currently explicitly advise use of conda

taqtiqa-mark commented 5 years ago

That is great, Conda is fine. I tried to follow your instructions in this comment, but ran into issues, specifically one command in the comment hints that Jupyter needs to be installed, but no mention is made in your README.md instructions? Apologies if I missed it, but I'm doing this in my spare time and the process so far has not been frictionless. Specifically:

jupyter-nbconvert --to notebook --execute --ExecutePreprocessor.timeout=100000 dlm_tutorial.ipynb >dlm_tutorial.log 2>&1

Can you provide conda instructions corresponding to what I provided in issue #6 for pip mentioned in your comment?

Please put the commands showing how to run the example notebook in the readme, and please link to a gist (or some such) that shows a successful run if the notebook does not test for correctness and print a message that all is correct.

This will show that your instructions work on one platform - it doesn't have to be Ubuntu or Linux I'd be happy to see proper instructions for any one platform.

justinalsing commented 5 years ago

OK - I've tried to make the installation instructions in the README even more explicit (see updated README). If users use the Anaconda python distribution (as is now explicitly recommended in the install instructions), then most of the dependencies including ipython and jupyter come pre-baked by default. This is preferred both because of the number of dependencies that come automatically, but also as noted because conda plays better with pystan.

I've included a note indicating that installation using pip3 is possible but done at the users own risk (some users might have reasons for not wanting to use Anaconda, but then they are on their own if they run into issues with pip and pystan, although I provide a link to the pystan readthedocs for advice if they need it).

I have also included instructions for executing the test suite (using jupyter-nbconvert etc) as recommended.

I'm copy-pasting the new install instructions from the updated README here for convenience:


Installation

Once you have downloaded the code from this repository you're ready to install dependencies and get set-up.

The code is python3 and has the following dependencies: numpy, scipy, matplotlib, jupyter, ipython, netCDF4, pystan.

Installation with conda (recommended)

The most painless way to get set up is using the Anaconda python distribution (recommended), which comes with most of the dependencies as default. The remaining dependencies can then be installed using conda install and the DLM models compiled by running:

conda install pystan netCDF4
python3 compile_stan_models.py

This second line compiles all of the DLM models on your machine, saves them in models/, and then you're ready to start DLMing! Jump straight into the jupyter notebook tutorial dlm_tutorial.ipynb (see below), or if you prefer you can run a test suite to check that the install worked and all models run smoothly by executing (this will take some minutes to run through):

jupyter-nbconvert --to notebook --execute --ExecutePreprocessor.timeout=100000 dlm_validation_tests.ipynb

Finally, if you want to see what a successful installation looks like, see INSTALL.md.

Installation with pip (at your own risk)

Anaconda is not a requirement for installing dlmmc, but is recommended because it works robustly with pystan. If you would rather use a different python distribution and pip3 for installing dependencies, you are welcome to (at your own risk); see the pystan readthedocs for advice on installing pystan using pip3 if you run into problems. Note that if you do not use Anaconda you will also have to install the other dependencies listed above, ie.,

pip3 install numpy scipy ipython[all] jupyter matplotlib netCDF4 pystan
python3 compile_stan_models.py

Platforms

dlmmc has been successfully installed on Mac, Linux and Windows. Note that there are some limitations to the functionality of pystan on Windows, but these do not restrict the use of the dlmmc package for Windows users.


justinalsing commented 5 years ago

I've also gone through the steps of creating a virtual environment with conda and doing a clean install (on my Mac) following your issue #6 and included this in an additional file - INSTALL.md.

Here's a summary (see also INSTALL.md and my response to #6 ):

Create and activate virtual environment:

justinalsing$ conda create -n dlmmc python=3.7 anaconda
justinalsing$ conda activate dlmmc

Install dependencies following install instructions in the README:

justinalsing$ conda install netCDF4 pystan

Compile the DLM models following instructions in README:

justinalsing$ python3 compile_stan_models.py

INFO:pystan:COMPILING THE C++ CODE FOR MODEL anon_model_1769d29906593e8f6fa11e816b642cff NOW.
INFO:pystan:COMPILING THE C++ CODE FOR MODEL anon_model_323f0530039bc4ac2c22bb5250e1d6c1 NOW.
INFO:pystan:COMPILING THE C++ CODE FOR MODEL anon_model_c3ff00cf2253f51bed2b150f31119693 NOW.
INFO:pystan:COMPILING THE C++ CODE FOR MODEL anon_model_b9cb9e0eb2389c8a6e3078345a6a1dd4 NOW.

Execute the dlm_tutorial.ipynb to check everything worked correctly:

justinalsing$ jupyter-nbconvert --to notebook --execute --ExecutePreprocessor.timeout=100000 dlm_tutorial.ipynb
[NbConvertApp] Converting notebook dlm_tutorial.ipynb to notebook
[NbConvertApp] Executing notebook with kernel: python3

Gradient evaluation took 0.026314 seconds
1000 transitions using 10 leapfrog steps per transition would take 263.14 seconds.
Adjust your expectations accordingly!

Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
Exception: multiply: B[12] is nan, but must not be nan!  (in 'unknown file name' at line 159)

If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.

Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
Exception: multiply: B[12] is nan, but must not be nan!  (in 'unknown file name' at line 159)

If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.

Iteration:    1 / 3000 [  0%]  (Warmup)
Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
Exception: multiply: B[12] is nan, but must not be nan!  (in 'unknown file name' at line 159)

If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.

Informational Message: The current Metropolis proposal is about to be rejected because of the following issue:
Exception: multiply: B[12] is nan, but must not be nan!  (in 'unknown file name' at line 159)

If this warning occurs sporadically, such as for highly constrained variable types like covariance matrices, then the sampler is fine,
but if this warning occurs often then your model may be either severely ill-conditioned or misspecified.

Iteration:  300 / 3000 [ 10%]  (Warmup)
Iteration:  600 / 3000 [ 20%]  (Warmup)
Iteration:  900 / 3000 [ 30%]  (Warmup)
Iteration: 1001 / 3000 [ 33%]  (Sampling)
Iteration: 1300 / 3000 [ 43%]  (Sampling)
Iteration: 1600 / 3000 [ 53%]  (Sampling)
Iteration: 1900 / 3000 [ 63%]  (Sampling)
Iteration: 2200 / 3000 [ 73%]  (Sampling)
Iteration: 2500 / 3000 [ 83%]  (Sampling)
Iteration: 2800 / 3000 [ 93%]  (Sampling)
Iteration: 3000 / 3000 [100%]  (Sampling)

 Elapsed Time: 180.043 seconds (Warm-up)
               372.797 seconds (Sampling)
               552.839 seconds (Total)

[NbConvertApp] Writing 687553 bytes to dlm_tutorial.nbconvert.ipynb