yt-project / widgyts

Widgets for yt
https://widgyts.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
10 stars 10 forks source link

Example notebook doesn't run - data file not found #32

Open psychemedia opened 5 years ago

psychemedia commented 5 years ago

The example notebook tries to load a data file that is not available. I found a copy here via here.

The python package h5py is also required to run the demo.

munkm commented 5 years ago

Hi @psychemedia! Thank you so much for the issue! I've created a PR to address the first point of your issue here at data-exp-lab/widgyts#34. Do you think this extra cell would have been descriptive enough for your experience?

Did you get an h5py dependency error with the notebook in this repository? I know I've made an notebook out there with a larger dataset (the Pop II Prime example) that uses h5py, but that's not in the examples folder of this repository. I'd like to make sure we properly document needing to use h5py, but I can't seem to find it in here.

psychemedia commented 5 years ago

Hi @munkm

My issue was mainly about just being able to quickly run the demos using MyBinder. eg https://mybinder.org/v2/gh/data-exp-lab/widgyts/master

To make things more reproducible in the notebook, you could include an example of how to download and unzip the file.

#Download the file
from urllib.request import urlretrieve

data_url = 'http://yt-project.org/data/IsolatedGalaxy.tar.gz'
savefile = 'IsolatedGalaxy.tar.gz'

urlretrieve(url, filename=savefile)

#Extract the file
import tarfile
tar = tarfile.open(savefile, "r:gz")
tar.extractall()
tar.close()

And yes, it seems to work fine with that demo file without the h5py package in MyBinder.

By the by, this is overkill but it gives a progress meter which may be useful when downloading large files:

!pip install tqdm
#https://github.com/tqdm/tqdm#hooks-and-callbacks

from tqdm import tqdm

class TqdmUpTo(tqdm):
    """Provides `update_to(n)` which uses `tqdm.update(delta_n)`."""
    def update_to(self, b=1, bsize=1, tsize=None):
        """
        b  : int, optional
            Number of blocks transferred so far [default: 1].
        bsize  : int, optional
            Size of each block (in tqdm units) [default: 1].
        tsize  : int, optional
            Total size (in tqdm units). If [default: None] remains unchanged.
        """
        if tsize is not None:
            self.total = tsize
        self.update(b * bsize - self.n)  # will also set self.n = b * bsize

def get_data_file(url, savefile=None):
    fn = url.split('/')[-1]
    savefile = fn if savefile is None else savefile

    with TqdmUpTo(desc=fn) as t:
        urlretrieve(url, filename=savefile, reporthook=t.update_to, data=None)

    return savefile

data_url = 'http://yt-project.org/data/IsolatedGalaxy.tar.gz'
datazip = get_data_file(data_url )

By the by, there's a typo in galaxy_display.ipynb: If you do not have IsoaltedGalaxy -> If you do not have IsolatedGalaxy

munkm commented 5 years ago

Ah, thanks for pointing out the typo! I'll do a quick fix for that.

I'm a little hesitant to add a cell execution that downloads and extracts our files here. I really like that your example would make things more reproducible for binder, but I do think it forces a user running the notebook locally to download data to the folder containing the notebook, which might not be their normal process. Though widgyts is a separate project from yt, I do think it's a positive in this example that we're maintaining continuity stylistically with other examples in the yt project. I realize these are my personal opinions so I'll leave this issue open for a while to get more feedback from the community!

matthewturk commented 5 years ago

@xarthisius may have some thoughts, too!