pvlib / pvlib-python

A set of documented functions for simulating the performance of photovoltaic energy systems.
https://pvlib-python.readthedocs.io
BSD 3-Clause "New" or "Revised" License
1.18k stars 996 forks source link

Missing Module Dependency - Tables #1252

Closed zhammond147 closed 3 years ago

zhammond147 commented 3 years ago

In the pvlib.clearsky.lookup_linke_turbidity() function, you have the following error handling:

try: import tables except ImportError: raise ImportError('The Linke turbidity lookup table requires tables. ' 'You can still use clearsky.ineichen if you ' 'supply your own turbidities.')

Unfortunately, the tables module has not been included as a dependency of the pvlib package.

kandersolar commented 3 years ago

It is an optional dependency: https://github.com/pvlib/pvlib-python/blob/master/setup.py#L56 See also https://pvlib-python.readthedocs.io/en/stable/installation.html#compatibility

Should this be explained better in the online docs? Or maybe mention something about pvlib[optional] in that error message?

cwhanse commented 3 years ago

Is there a good reason that tables is still optional? This is a pothole frequently hit by users.

kahemker commented 3 years ago

I ran into many problems yesterday building a new environment yesterday on Windows and they all seemed to revolve around getting tables installed properly. I eventually caved and just installed Ubuntu via WSL and ran through the environment creation on Linux.

I'll eventually figure out how to build the environment on Windows again since I prefer working with PyCharm and it's debugger, but if there is any advice you all can provide for the Windows environment build process that would be great.

kandersolar commented 3 years ago

@kahemker can you post details about the commands you're running and the errors messages you got? I'm happy to help figure out a solution, but I can't seem to reproduce the issue myself -- I went to some effort in #1287 to make sure tables wouldn't be a barrier to installation (except OSX and py3.9), and setting up a new environment and installing pvlib/master completes successfully on both our Windows CI environments and my Windows computer:

conda create -n pvlib-windows python=3.8
conda activate pvlib-windows
pip install .

It's just grabbing the wheel off PyPI:

Collecting tables
  Downloading tables-3.6.1-2-cp38-cp38-win_amd64.whl (3.1 MB)
     |████████████████████████████████| 3.1 MB 2.2 MB/s
kahemker commented 3 years ago

Wow. Thank you @kanderso-nrel . I think the problem was trying to install the environment on python=3.9 and following these instructions on the documentation for setting up a virtual environment.

The primary errors in building wheel for tables-3.6.1 on Python 3.9 seem to revolve around the lzo decompression libraries. There are a ton of link errors like this:

LINK : fatal error LNK1181: cannot open input file 'lzo2.lib' LINK : fatal error LNK1181: cannot open input file 'liblzo.lib' LINK : fatal error LNK1181: cannot open input file 'bzip2.lib' LINK : fatal error LNK1181: cannot open input file 'blosc.lib' LINK : fatal error LNK1181: cannot open input file 'hdf5.lib' error: command 'C:\\Program Files (x86)\\Microsoft Visual Studio\\2019\\BuildTools\\VC\\Tools\\MSVC\\14.29.30133\\bin\\HostX86\\x64\\link.exe' failed with exit code 1181

The error list gets very long and starts to revolve around Microsoft Visual Studio 2019 build tools

I'll stick with Python 3.8 for now. Wish I would have found #1287 yesterday! I did learn a lot about WSL and it seems like a pretty solid option for developing on Linux OS without the overhead of VMs.

kandersolar commented 3 years ago

Oh right, I should have asked if you were using python 3.9. At the moment tables has only released 3.9 wheels for linux (ctrl-F cp39 here) and not Windows or OS X: https://github.com/PyTables/PyTables/issues/823. So pip install tables on py39 can't install a pre-built binary on any platform except linux and has to attempt building one from source, which is very likely to fail on a normal Windows installation.

For anyone running into an issue installing tables on python 3.9, here are some options to avoid the hassle of trying to build it from source, ordered from most to least recommended:

  1. If you have conda available, you can run conda install pytables before installing pvlib. That way it is already available and pip can skip it instead of resorting to trying to build it from source. Note that the package is called pytables by conda but tables by PyPI. You could also install a complete environment using conda with the environment files listed here: https://github.com/pvlib/pvlib-python/tree/master/ci
  2. Use an older python version like 3.7 or 3.8 instead of 3.9, just because PyPI has tables wheels available for those versions. Easy to do with conda, though I strongly recommend creating a new environment for this instead of replacing an existing python installation.
  3. Windows users could try using Cristoph Gohlke's wheels: https://www.lfd.uci.edu/~gohlke/pythonlibs/#pytables
  4. Use WSL like @kahemker did, or even a VM (see this semi-relevant quote). In case it's not obvious: this option is significantly more complex than the previous options and not something you should attempt without having done some reading to know what you're doing.
mikofski commented 3 years ago

I think we're likely to run into this many, many times. probably should do a stack overflow search for pvlib + tables or install. One constant source of confusion is that in conda it's called "pytables" but in pypi it's just "tables"

wholmgren commented 3 years ago

All great ideas. For the sphinx docs, I think adding a short note with a link to this would be fine. It would make sense to do that before 0.9 is tagged.

At the risk of issue scope creep, it's also worth considering if we should use a different format since this seems likely to repeat with python 3.10.

mikofski commented 3 years ago

I prefer h5py, it's a lot easier to use, more stable, better maintained imho, reuses the numpy api for structured arrays, and it's from the actual makers of the HDF5. But unfortunately, pandas chose to use tables which was unfortunate imo. It wouldn't be too hard to switch the raw linke turbidity data to use h5py. Once extracted, the numpy API makes it super easy to create a data frame by just passing the structured array directly to pd.DataFrame(). Another advantage of h5py over pytables is that the h5 archive could be read from MATLAB, R, or any other codebase, not just limited to python.

Another option is parquet, which is built into pandas and is quite popular.

And of course, we're already using netcdf4, which seems like the obvious choice, but I don't believe there's a pandas read_nc function either. And parsing and usage is a lot more difficult imo than hdf5

adriesse commented 3 years ago

I used to use pandas.to_hdf() all the time, and when I'm lazy I still do for short-term storage. For a brief time I thought h5py was the way to go for better sharing, but it's really pretty low level (e.g. you have to encode and parse timestamps and transpose 2d structures in matlab). Currently I'm a big fan of netcdf4 made easy by xarray.

kandersolar commented 3 years ago

Here is some basic exploration comparing packages for reading the TL data stored as .h5 and .nc files: https://gist.github.com/kanderso-nrel/09c320d08ef8daac80f3302e4b11b1ac

To summarize:

I did not try parquet. Does it support lazy loading/indexing like h5 and nc? I think pd.read_parquet requires pyarrow or fastparquet, so we'd still need a dependency for this.

mikofski commented 3 years ago

OK, I just assumed that we were using tables because it works with pandas, but it turns out that the LinkeTurbidity.h5 file is completely sane and pvlib doesn't use pandas at all with it, so there is ZERO reason to use tables here. I've just tested and it works absolutely fine with h5py. Thanks @kanderso-nrel for testing it with netcdf4 and xarray where it also works totally fine out of the box.

Since we already import netcdf4 and I believe we're about to start using xarray, I'm in favor of using one of those two. I'm slightly more in favor of using xarray, and wondering if we can replace netcdf4 everywhere with xarray, even tho it comes with extra latency, bc then it's just one package and it plays well with pandas. I'd also be happy to use h5py. Just not tables.

Let's remove the tables dependency before we ship v0.9 so we don't have to make further changes. Tables and PyTables is an unnecessary headache in my book.

mikofski commented 3 years ago

BTW: I though TL data was originally available for download as an .nc file, am I wrong? I guess it's not available from soda pro anymore?

Anyone know are we using the 2003 or 2010 values? what's the difference?

wholmgren commented 3 years ago

Thanks @kanderso-nrel for the careful comparisons!

I think we should switch to h5py before releasing 0.9. I don't see any value in adding the netcdf4 layer on top of the hdf5 file for this data set. I also don't see any reason use xarray for this simple file and read operation.

Two things that people probably already know but I feel like are not really addressed in some of the discussion above:

  1. netcdf4 uses hdf5 -- a netcdf4 file is a hdf5 file organized in a particular way plus extra metadata
  2. pip install xarray[io] brings in h5py along with netcdf4 and h5netcdf4, see details below

Looks like we only went with pytables because that's what pandas used and we didn't put much more thought into it: https://github.com/pvlib/pvlib-python/issues/437#issuecomment-376657958.

``` $ pip install xarray[io] $ pip list Package Version ------------------ ------------------- affine 2.3.0 appdirs 1.4.4 asciitree 0.3.3 attrs 21.2.0 beautifulsoup4 4.9.3 certifi 2021.5.30 cffi 1.14.6 cfgrib 0.9.9.0 cftime 1.5.0 charset-normalizer 2.0.4 click 8.0.1 click-plugins 1.1.1 cligj 0.7.2 docopt 0.6.2 eccodes 1.3.3 fasteners 0.16.3 findlibs 0.0.2 fsspec 2021.7.0 h5netcdf 0.11.0 h5py 3.4.0 idna 3.2 Jinja2 3.0.1 MarkupSafe 2.0.1 netCDF4 1.5.7 numcodecs 0.9.0 numpy 1.21.2 packaging 21.0 pandas 1.3.2 pip 21.2.4 pooch 1.5.1 pycparser 2.20 Pydap 3.2.2 pyparsing 2.4.7 python-dateutil 2.8.2 pytz 2021.1 rasterio 1.2.6 requests 2.26.0 scipy 1.7.1 setuptools 52.0.0.post20210125 six 1.16.0 snuggs 1.4.7 soupsieve 2.2.1 urllib3 1.26.6 WebOb 1.8.7 wheel 0.37.0 xarray 0.19.0 zarr 2.9.3 ```