JOSS Review (@zbeekman): Tutorial part 4 (viz) is broken

zbeekman commented 5 years ago

The script recolor.py which is part of Tutorial Part 4 is broken in a number of ways:

There is not proper exception handling when searching for ALTI*0_t0.log files. When the script fails to find them it silently bails with zero status exit code (success).
The path to search for the ALTI*0_t0.log files is wrong; should be . not ...
The existence of PNG files checked into the repository, combined with 1 and 2 make it appear that everything worked as intended

Replacing the ALTI*)_t0.log files that come in that directory (after fixing the search path) with the files generated in tutorial step 3 results in the following python error:

python3 recolor.py
Traceback (most recent call last):
File "recolor.py", line 17, in <module>
data = np.loadtxt(os.path.join(dirname, filename))
File "/usr/local/lib/python3.7/site-packages/numpy/lib/npyio.py", line 1141, in loadtxt
for x in read_data(_loadtxt_chunksize):
File "/usr/local/lib/python3.7/site-packages/numpy/lib/npyio.py", line 1065, in read_data
% line_num)
ValueError: Wrong number of columns at line 161

Fixing the search path in recolor.py and re-running it with the ALTI*0_t0.log files results in an error after processing the first two files:

Recolored ALTI00050_t0.log
Recolored ALTI00150_t0.log
Traceback (most recent call last):
File "recolor.py", line 17, in <module>
data = np.loadtxt(os.path.join(dirname, filename))
File "/usr/local/lib/python3.7/site-packages/numpy/lib/npyio.py", line 1141, in loadtxt
for x in read_data(_loadtxt_chunksize):
File "/usr/local/lib/python3.7/site-packages/numpy/lib/npyio.py", line 1065, in read_data
% line_num)
ValueError: Wrong number of columns at line 161

Furthermore I recommend setting up continuous integration testing to ensure that:

The code always builds
Each example in the tutorial always runs
Any additional memory leak & debugging checks can also be run provided they are not too expensive

Short of implementing, CI you should have some sort of script to ensure that all examples in the tutorial run (if you're considering that your test suite). Once you have this, it's not that hard to setup some basic CI testing.

kellykochanski commented 5 years ago

@defaziogiancarlo Quick question - do you have a preferred function for visualizing an ALTI file?

I saw that your last commit, after I wrote the now-broken piece of the tutorial, included some new tools for reading and visualizing Rescal height maps / ALTI files. (At present, there are several ways to do this, including the recolor.py script that triggered this issue - it would be nice to direct everything to one working file).

kellykochanski commented 5 years ago

@zbeekman The tutorial will now work as shown; it now uses a mix of explicit python commands (to walk users through some of the output files + tools) and a re-usable tool from heightmap.py in the utilities folder instead of using recolor.py, which was a hacky one-off script just for this tutorial.

defaziogiancarlo commented 5 years ago

The scripts/utilities/heightmap.py script has a few ways to draw height maps. All the ways are just using matplotlib default settings. If you create a HeightMap object, using heightmap.HeightMap("Some ALTI file") There's a draw method and a plot_3d, both of which make a window pop up.

The CellSpace objects contain HeightMap objects, you can use the draw_height_map method.

kellykochanski commented 5 years ago

@zbeekman I've added a test script (test/test.sh) which runs the snow_cone*, visualization, and parameter space exploration setup examples.

It is not fully automatic - after it runs things, it repeatedly prompts the user to confirm that output looks OK - but since so much of the output of Rescal-snow is visual, I believe that keeping human eyes on the output is useful, and it means this test will stay relevant even if the details of the example scripts or some of the code functionality is changed.

Is this the kind of test script you were thinking of? I agree that CI would be an ideal standard, but it looks like it would be more challenging to implement.

(I included only one of the Rescal-snow examples, the snow cone, because in my experience all 3 crash or succeed together, and including only one makes the test script quicker to run and easier to use.)

r-barnes commented 5 years ago

@kellykochanski : I can help with this, but it would be good if your could merge #42 first.

kellykochanski commented 5 years ago

Merged.

r-barnes commented 5 years ago

See #43.

kellykochanski commented 5 years ago

@zbeekman Rescal-snow now has a test directory, containing scripts that test:

We haven't set up CI (partly due to time, partly because the visualization example isn't quite automated and requires user input), but these changes have fixed the bug you found, and created a more organized testing process to lower the risk of another one.

zbeekman commented 5 years ago

@kellykochanski I'm still hitting an error in heightmap.py when following the updated tutorial in the (now-manual) python section:

Traceback (most recent call last):
  File "<stdin>", line 4, in <module>
  File "utilities/heightmap.py", line 242, in __init__
    self.height_map = np.loadtxt(height_input).astype(np.uint8)
  File "/usr/local/lib/python3.7/site-packages/numpy/lib/npyio.py", line 1146, in loadtxt
    for x in read_data(_loadtxt_chunksize):
  File "/usr/local/lib/python3.7/site-packages/numpy/lib/npyio.py", line 1071, in read_data
    % line_num)
ValueError: Wrong number of columns at line 161

Here are my complete commands.

$ python3
Python 3.7.4 (default, Sep  7 2019, 18:27:02)
[Clang 10.0.1 (clang-1001.0.46.4)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> sys.path.append('utilities')    # Add utilities to Python path
>>> import os
>>> import matplotlib.pyplot as plt # Plotting
>>> import heightmap                # utilities/heightmap.py contains tools for reading and visualizing height maps (ALTI files)
>>> # Find the heightmap files to visualize. In this tutorial, they are named out/ALTIxxxxx.log.
... filenames = ['out/'+f for f in os.listdir('out') if f[:4]=='ALTI']
>>> filenames = [f for f in filenames if f[-4:]=='.log']
>>> filenames.sort()
>>> print(filenames)
['out/ALTI00000_t0.log', 'out/ALTI00001_t0.log', 'out/ALTI00002_t0.log', 'out/ALTI00003_t0.log', 'out/ALTI00004_t0.log', 'out/ALTI00005_t0.log', 'out/ALTI00006_t0.log', 'out/ALTI00007_t0.log', 'out/ALTI00008_t0.log', 'out/ALTI00009_t0.log', 'out/ALTI00010_t0.log', 'out/ALTI00011_t0.log', 'out/ALTI00012_t0.log', 'out/ALTI00013_t0.log', 'out/ALTI00014_t0.log', 'out/ALTI00015_t0.log', 'out/ALTI00016_t0.log', 'out/ALTI00017_t0.log', 'out/ALTI00018_t0.log', 'out/ALTI00019_t0.log', 'out/ALTI00020_t0.log', 'out/ALTI00021_t0.log', 'out/ALTI00022_t0.log', 'out/ALTI00023_t0.log', 'out/ALTI00024_t0.log', 'out/ALTI00025_t0.log', 'out/ALTI00026_t0.log', 'out/ALTI00027_t0.log', 'out/ALTI00028_t0.log', 'out/ALTI00029_t0.log', 'out/ALTI00030_t0.log', 'out/ALTI00031_t0.log', 'out/ALTI00032_t0.log', 'out/ALTI00033_t0.log', 'out/ALTI00034_t0.log', 'out/ALTI00035_t0.log', 'out/ALTI00036_t0.log', 'out/ALTI00037_t0.log', 'out/ALTI00038_t0.log', 'out/ALTI00039_t0.log', 'out/ALTI00040_t0.log', 'out/ALTI00041_t0.log', 'out/ALTI00042_t0.log', 'out/ALTI00043_t0.log', 'out/ALTI00044_t0.log', 'out/ALTI00045_t0.log', 'out/ALTI00046_t0.log', 'out/ALTI00047_t0.log', 'out/ALTI00048_t0.log', 'out/ALTI00049_t0.log', 'out/ALTI00050_t0.log', 'out/ALTI00051_t0.log', 'out/ALTI00052_t0.log', 'out/ALTI00053_t0.log', 'out/ALTI00054_t0.log', 'out/ALTI00055_t0.log', 'out/ALTI00056_t0.log', 'out/ALTI00057_t0.log', 'out/ALTI00058_t0.log', 'out/ALTI00059_t0.log', 'out/ALTI00060_t0.log']
>>> # Visualize the files. This may take a few seconds.
... for name in filenames:
...     # Read the file as a HeightMap object
...     hm  = heightmap.HeightMap(name)
...     # hm.draw                   # Uncomment to view each image as it is produced
...     hm.save_color_map(name[:-4] + '_image.png')
...
Traceback (most recent call last):
  File "<stdin>", line 4, in <module>
  File "utilities/heightmap.py", line 242, in __init__
    self.height_map = np.loadtxt(height_input).astype(np.uint8)
  File "/usr/local/lib/python3.7/site-packages/numpy/lib/npyio.py", line 1146, in loadtxt
    for x in read_data(_loadtxt_chunksize):
  File "/usr/local/lib/python3.7/site-packages/numpy/lib/npyio.py", line 1071, in read_data
    % line_num)
ValueError: Wrong number of columns at line 161

I can try this on a Linux box to see if this is some strange macOS behavior. I still think that if you have this script as part of your tutorial, there should be a way to test it. If you want it to be more interactive than just running a script I recommend exploring jupyter notebooks, and their ability to be run non-interactively for testing.

If I can't reproduce this issue on linux, then I may just consider this good enough, and a macOS specific bug (from the perspective of my JOSS reviewer duties) but in the long run it would be good to see testing on other platforms of this script.

Also, I'll comment in the PR, but I think you should re-word the python section of the tutorial a little bit: you're no longer including a script, but rather giving folks a set of commands to run in an interactive interpreter.

zbeekman commented 5 years ago

I wonder if it might make sense to use pipenv or conda's environment.yml to specify python versions and lock required module versions? This would increase re-producibility and headaches caused by wacky python environments.

defaziogiancarlo commented 5 years ago

This looks like one of the ALTI files might be malformed. Using np.loadtxt, I get a
ValueError: Wrong number of columns at line <number> when I try load in a file in which I purposely removed a value from on of the lines.

Do any of the _image.png files get created, or does it fail on the first ALTI file?

zbeekman commented 5 years ago

@defaziogiancarlo could this be a line ending error/issue? People often setup git to checkout text files with native endings, etc. So I wonder if there's logic somewhere that assumes a certain number of characters/bytes per line? Then if you convert the line endings from \n\r to \n (or vise versa) something blows up?

I have no png files in ./script or in ./script/out.

zbeekman commented 5 years ago

EDIT: It could still be line-ending assumption related, but not git related.

Hmmmm, ~~never mind,~~ I see that the ALTI files were generated by running the code, and are not checked in.

Here is a tarball of my ALTI files, if it helps. I'm following along the tutorial.

ALTI_t0-logs.tar.gz

defaziogiancarlo commented 5 years ago

Something is wrong with the ALTI file generation. The number of columns doubles for some reason at line 161. I think the actual data ends at line 160 and the rest is all 6 and 7, which is neutral and sand input cell types.

kellykochanski commented 5 years ago

@defaziogiancarlo @zbeekman This is a file management problem.

When you run all of the tutorials in sequence, not all of the output from old runs (snow_cone, sintering) is deleted before hitting the snowfall tutorial that produce the output used in the visualization tutorial.

Thus, the visualization tutorial is trying to read ALTI files from a mix of runs. This triggers an error, as heightmap expects to read files with a consistent shape, size, and layout. Presumably I missed this earlier by testing the tutorials out of order.

I've updated the tutorial scripts to clean the entirety of the old output before generating new output, which should remove the problem.

I'm also updating heightmap to produce a more useful error message in the case of irregular file shapes and sizes.

zbeekman commented 5 years ago

@kellykochanski Ok, I've gotten through the entire tutorial. There may be an additional issue with the parameter space exploration example. I'm on a slurm system, and using your scripts, as outlined in the tutorial, appears to only run the first example. Furthermore, the sbatch script doesn't look like a typical slurm array job, I suspect this is the source of the issue.

kellykochanski commented 5 years ago

I'm glad to hear it, and sorry that this run through wasn't as smooth as I would have liked.

kellykochanski commented 5 years ago

I'm inclined to handle the sbatch issue by removing the slurm script (as my own work is all on moab, making the slurm script is harder for me to test), and encouraging users to use the moab script or follow their cluster's guidelines.

zbeekman commented 5 years ago

@kellykochanski I'm responding over here so as to not distract the JOSS folks:

I'm trying to avoid relying on Jupyter notebooks, unfortunately - I do production runs in an hpc environment without access to IPython, and the tutorial is used by students learning to work in the same environment.

Yes, BUT you can export the jupyter document in other formats; markdown, pdf, etc. so you could test jupyter on CI and let students follow along from markdown or rendered notebook. At any rate, if I take a stab at CI support once you've minted a release for JOSS, I'll look into this. It should be straightforward to copy the example to a notebook.

kellykochanski commented 5 years ago

@zbeekman I like the idea of a regularly teted notebook that, when exported as markdown, forms something very much like the existing tutorial. I'll let JOSS go through, then move the existing tutorial into a notebook.

kellykochanski / rescal-snow

JOSS Review (@zbeekman): Tutorial part 4 (viz) is broken #41