ioos / erddapy

Python interface for ERDDAP
https://ioos.github.io/erddapy/
BSD 3-Clause "New" or "Revised" License
76 stars 30 forks source link

Download file utility function #322

Closed ocefpaf closed 8 months ago

ocefpaf commented 11 months ago

Watching @callumrollo's awesome ERDDAP presentation today made us realize that we don't have an easy way for users to "just download" data for unusual formats. Say, a Matlab user that wants to get a .mat file without the need to write more Python code than they are comfortable with.

Most Python users are OK handling the download themselves but we can make it easier to non-Python users that are using erddapy to fetch data.

We can provide a download_url function for this that uses httpx and saves the file locally..

ocefpaf commented 8 months ago

Something like:

import httpx
from urllib.parse import urlparse
from pathlib import Path
import rich.progress

url = "https://erddap.ioos.us/erddap/tabledap/gts_non_ndbc_statistics.mat"
fname = Path(urlparse(url).path).name

def download_file_from_erddap(url):
    fname = Path(Path(urlparse(url).path).name)
    if fname.exists():
        raise FileExistsError(f"File {fname} exists, refusing to overwrite.")
    with open(fname, "wb") as download_file:
        with httpx.stream("GET", url) as response:
            print(response.headers)
            total = int(response.headers.get("Content-Length", 0))

            with rich.progress.Progress(
                "[progress.percentage]{task.percentage:>3.0f}%",
                rich.progress.BarColumn(bar_width=None),
                rich.progress.DownloadColumn(),
                rich.progress.TransferSpeedColumn(),
            ) as progress:
                download_task = progress.add_task("Download", total=total)
                for chunk in response.iter_bytes():
                    download_file.write(chunk)
                    progress.update(download_task, completed=response.num_bytes_downloaded)

would do. I'm not sold on adding another dependency just for the sake of a progress bar. Also, most ERDDAP fies don't provide a Content-Length anyway.

callumrollo commented 8 months ago

looks good, I'll put in a PR. Do you envisage this as a standalone function or as part of the ERDDAP class?

ocefpaf commented 8 months ago

I think that as a standalone but no strong opinion about it. What do you think?

callumrollo commented 8 months ago

Having thought on this, I think if could work nicely as part of the ERDDAP class, see draft PR #330

The main reason for this is so that the user can leverage the existing functionatliy to subset variables and apply constraints. What do you think? Otherwise happy to redo it as a standalone function

ocefpaf commented 8 months ago

The main reason for this is so that the user can leverage the existing functionatliy to subset variables and apply constraints. What do you think?

Look great. Thanks for taking a stab at this one.

callumrollo commented 8 months ago

resolved in #330