Closed michaelaye closed 2 years ago
Ah, doh, it's not the import fails, but the code after the import:
24 for _c in catalogue:
---> 25 globals()[_c] = catalogue[_c]
I'm having a hard time reproducing this in a notebook due to the usage of __file__
, would it be okay for a PR to use importlib.resources
to find the path to the datasets.yaml
file?
Maybe add import intake_parquet
within the try
section to be sure to raise the exception when that plugin is not installed?
yes, but if that trial import is acceptable (wasn't sure about performance), then I'd add intake-xarray, and s3fs as well, as those are also required? (Which makes this bug annoying as one needs to try it 3 times, before learning all those 3 missing packages.. . ;)
I would have thought those would be recursive subdependencies, but if not, then yes, import all those in the try
block as well. To make it fail more quickly when it will fail, the first import should be the one most likely to fail (i.e. least likely to be installed in a typical environment), which I'd guess here would be intake_parquet
.
They seem to be independent packages:
β― mamba info intake-xarray=0.5
intake-xarray 0.5.0 pyhd8ed1ab_0
--------------------------------
file name : intake-xarray-0.5.0-pyhd8ed1ab_0.tar.bz2
name : intake-xarray
version : 0.5.0
build string: pyhd8ed1ab_0
build number: 0
channel : https://conda.anaconda.org/conda-forge/noarch
size : 1.4 MB
arch : None
constrains : ()
license : BSD-2-Clause
license_family: BSD
md5 : 43d9d1c90da0b2b28cc16e58a52a0f2b
noarch : python
package_type: noarch_python
platform : None
sha256 : 91a388e5eb015b192bc17de04c55b102576d1c1b08571a80a1a9a1bc6c878f91
subdir : noarch
timestamp : 1616085245631
url : https://conda.anaconda.org/conda-forge/noarch/intake-xarray-0.5.0-pyhd8ed1ab_0.tar.bz2
dependencies:
dask >=2.2
intake >=0.5.2
netcdf4
python >=3.5
xarray >=0.12.0
zarr
WARNING: 'conda info package_name' is deprecated.
Use 'conda search package_name --info'.
site-packages/hvplot/examples via π v3.9.9 via π
py39 took 5s
β― mamba search intake-parquet
Loading channels: done
# Name Version Build Channel
intake-parquet 0.2.1 py_0 conda-forge
intake-parquet 0.2.2 py_0 conda-forge
intake-parquet 0.2.3 py_0 conda-forge
site-packages/hvplot/examples via π v3.9.9 via π
py39 took 5s
β― mamba info intake-parquet=0.2.3
intake-parquet 0.2.3 py_0
-------------------------
file name : intake-parquet-0.2.3-py_0.tar.bz2
name : intake-parquet
version : 0.2.3
build string: py_0
build number: 0
channel : https://conda.anaconda.org/conda-forge/noarch
size : 10 KB
arch : None
constrains : ()
license : BSD-2-Clause
license_family: BSD
md5 : b7d04be2fb7b43946cf06dc5f7f04ad1
noarch : python
package_type: noarch_python
platform : None
sha256 : 2981d0998aa3e30713c6b2012a4557e77b70ed6e04778f9365c4fdeb593576ca
subdir : noarch
timestamp : 1573509119874
url : https://conda.anaconda.org/conda-forge/noarch/intake-parquet-0.2.3-py_0.tar.bz2
dependencies:
dask
fastparquet
intake >=0.3
jinja2
pandas
pyarrow
python >=3.5
WARNING: 'conda info package_name' is deprecated.
Use 'conda search package_name --info'.
and s3fs is obviously unrelated. Will play with it and then submit a PR.
Saw the same thing in #562.
I still feel like it is a lot to install just to run the second page in a user guide. Why not just download the data with request or urllib like e.g. bokeh does?
well, b/c in the old way, basically everybody is writing a mini-download manager, as one can see from your link. I think relying on intake for data-management is a good thing that should be pushed further. However, I agree this needs to be carefully balanced with tutorial hurdles, which should always be minimized, which is why I reported this as a bug. One should never have a user guide step fail 3 times. Possibly one should simply add the above 4 packages to a user guide prep section?
I just think it is a lot to ask for new users to download 4 packages just to get access to a 8 KB file (us-crime) and a 15 MB (airline_flights) file. I just tried to see if I could run the Plotting page from a clean environment:
Created the environment with mamba create -n hvplot_example python=3.8 hvplot jupyterlab
First cell needed to install dask
.
Second cell needed to install intake intake-parquet intake-xarray s3fs
.
Third cell needed to install IProgress
, afterwards it raises a FileNotFoundError
? Then I tried to change cell to:
import dask.dataframe as dd
flights = dd.read_parquet("s3://assets.holoviews.org/data/airline_flights.parq").persist()
print(type(flights))
flights.head()
But this gives a NoCredentialsError: Unable to locate credentials
. Got this to work by changing s3
to http
.
To run the bivariate plot I needed to install scipy
.
For the section Large Data to run I needed datashader
.
Other things I noticed when trying to get the notebook to work:
.compute
on dask dataframe anymore to use hvplot (but I could be wrong).I will properly make a PR for 1 and 3 today.
@michaelaye could you get this to work?
flights = airline_flights.to_dask().persist()
print(type(flights))
flights.head()
yes. which step fails for you? Did you install the missing libraries?
Installed all of them. It just fails with the following message:
Created with mamba create -n tmp python=3.8 hvplot jupyterlab dask intake intake-parquet intake-xarray s3fs IProgress scipy datashader
Your information is inconsistent. An environment created with that mamba
conmmand does not end up having pyviz/holoviews packages from the pyviz channel, unless you put the pyviz channel in the base conda config higher in conda priority than conda-forge? I wouldn't do that as I believe packages in conda-forge receive a better fit-to-all
check than packages from outside entities.
What is the output of conda config --show-sources
?
What is your OS?
I just did mamba create -n tmp python=3.8 hvplot jupyterlab dask intake intake-parquet intake-xarray s3fs IProgress scipy datashader
on my linux machine, conda activate tmp, and then
from hvplot.sample_data import airline_flights
flights = airline_flights.to_dask().persist()
print(flights.compute().head())
without any issue.
I noticed that the environment created by above mamba command had several newer packages than yours (libthrift and grpc-cpp) but also some older (pyct and pyct-core are 0.4.6 in my tmp
env).
I'll try Mac now to see what I get there.
wow, just adding conda config --add channels pyviz
makes mamba
not even be able to find a working set, that's how bad it is to add the pyviz
channel. I cleanly work from conda-forge only with very very few exceptions for things only being on pypi and have never (or very rarely) see mamba fail to resolve things.
No issues on my Mac either. Here's what I get when I do:
conda config --show-sources
==> /home/maye/miniconda3/.condarc <==
channels:
- conda-forge
==> /home/maye/.condarc <==
channel_priority: strict
channels: []
report_errors: True
Ok, i managed to create an environment very similar to yours by adding pyviz as a channel and setting my channel_priority to flexible. However, everything works fine still with the hvplot airline data.
It might be time to blast your whole miniconda folder for having some rotten libraries somewhere. I need to do that myself maybe once a year when something gets corrupted for being too adventurous with installing stuff.
Here's my env, for cross-checking:
Some observations:
abseil-cpp
package and your's didn't, with the apparent same mamba command? (There might be other differences, I didn't compare the whole list.)Thank you for the investigation.
Yes you are correct that I was not "telling the truth" about my setup - I forgot to add my .condarc
file. But as you concluded it did contain pyviz
in my channels.
I have removed pyviz
and default
from my .condarc
and have removed and reinstalled miniconda, but I still get an error FileNotFoundError
(like I originally did).
β conda config --show-sources
==> /home/shh/miniconda3/.condarc <==
channels:
- conda-forge
==> /home/shh/.condarc <==
changeps1: False
ssl_verify: True
channel_priority: strict
channels:
- conda-forge
Could see airline_flights.cache_dirs
outputs /home/shh/.intake/cache
for me. When I deleted the folder it gave an error message ImportError: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
and after installing ipywidgets
and removing the .intake
folder again I could get this example to work.
@michaelaye can you try to rename your .intake cache folder and see if you also can get FileNotFoundError
?
yes, I do. But I think by only renaming the cache
folder you corrupt the intake data management system, because the persisted
folder is still there, with partial records that need to access the cache folder whenever required. So when you only rename the cache folder, you have created an impossible state for intake
. If you had renamed the whole .intake
folder, things would have worked, I just confirmed that.
I agree. I was not very clear in my last post. If you have time can you try the following steps:
1) Try to rename/delete the .intake
folder and uninstall ipywidgets
and IProgress
for the tmp environment.
2) Run the notebook. For me this create the .intake
folder but will ImportError
and not download the data.
3) Install ipywidgets
.
4) Run the notebook again. This time I get a FileNotFoundError
, because the data was not downloaded but all the folder was created.
To get this to work I have to delete the .intake
folder and run the folder the notebook again.
Above mamba command doesn't install ipywidgets, so I will only deal with iprogress
.
I confirm your given scenario to fail in the notebook (it doesn't fail in ipython console as it doesn't use progress bars from the library you uninstalled).
Question: Why would you uninstall iprogress
, a notebook-supporting progress library when you want to work in the notebook?
What you have identified though are two bugs that are worth reporting:
iprogress
as it obviously crashes when it's silently removed. That silent removal is only possible because mamba/conda can uninstall it without uninstalling tqdm (the progress bar package)intake
system does not clean up the file handles after the download process was interrupted by the missing iprogress
package. Definitely worth reporting to prevent others from running into this edge case (although only caused by uninstalling iprogress. Again, why would you do that?)This is as much time I can invest into this. Please create 2 issues at the respective github repositories.
I uninstalled iprogress
because it was explicitly installed when creating the environment. iprogress
is no longer supported and from what I can see it has been replaced by something similar in ipywidgets
. I can only get the download to work with ipywidgets
and not with iprogress
. I don't understand why yours work with iprogress
and mine doesn't.
I will file some bug reports later, so hopefully, new users (and me...) will be able to run the example without all these problems.
Thank you for helping me with finding the root of the problem, I really appreciate it!
Ah, i even didn't see that deliberate IProgress
install in the mamba command.
I
tqdm
again fails (which is a bug, because if it needs something like that to run, it should add it to the conda-forge package dependency), ipywidgets
only this time, and that seems to cover whatever tqdm
needs to properly run.
~/.intake
again, due to 2nd bug with intake's file management. Just to clarify, it sounds like there are some upstream issues to report, but it's ok that this issue was auto-closed when I merged #693? If so, fine, but if there are remaining issues above with hvplot for us to address, please reopen this issue and summarize what we need to do in hvplot. Thanks!
For references the problem with intake should be solved with https://github.com/intake/intake/pull/655
My input regarding package dependencies. I think we should minimize this. Many users who could benefit from hvPlot etc. would not know about intake, parquet, sf3s etc. they so suddenly have to relate to 4 new packages instead of one.
make the introduction and first half of the tutorial simple and something familiar to most users.
ALL software version info
hvplot: 0.7.3
Description of expected behavior and the observed behavior
The following import fails, despite the all-catching
except
in the code?? (Honestly stumped)For reference, this is the code in 0.7.3:
How can intake throw a ValueError??
Complete, minimal, self-contained example code that reproduces the issue
intake
installed, no other intake-subpackages.from hvplot.sample_data import us_crime, airline_flights
Stack traceback and/or browser JavaScript console output
Additional info
The list of required package is now this: