nsidc / earthaccess

Python Library for NASA Earthdata APIs
https://earthaccess.readthedocs.io/
MIT License
414 stars 81 forks source link

`earthaccess.download()` ignores errors #581

Open mfisher87 opened 5 months ago

mfisher87 commented 5 months ago

We found the pqdm library is the thing swallowing all the error messages. pqdm has an exception_behavior option, which defaults to "ignore", surprisingly. The other two settings, "deferred" and "immediate", are both good options for earthaccess.

@chuckwondo and I agree "immediate" is the most intuitive default, and we can establish a new flag fail_fast: bool = True to override it to "deferred" by passing download(..., fail_fast=False). Prior discussion in #36. Please continue discussion here :)

I feel exposing the "ignore" behavior isn't necessary because the user can do their own exception handling with try.

Thoughts?

Sherwin-14 commented 2 months ago

@mfisher87 This seems interesting, what should we proceed immediateor deferred?

chuckwondo commented 2 months ago

@mfisher87 This seems interesting, what should we proceed immediate or deferred?

@Sherwin-14, we want to proceed with both, meaning that we want both options available to the user by doing the following:

mfisher87 commented 2 months ago

@Sherwin-14 don't forget to take a look at #36 for more context! Do you want to be assigned this issue?

Sherwin-14 commented 2 months ago

def get(
        self,
        granules: Union[List[DataGranule], List[str]],
        local_path: Union[Path, str, None] = None,
        provider: Optional[str] = None,
        threads: int = 8,
    ) -> List[str]:

        if local_path is None:
            today = datetime.datetime.today().strftime("%Y-%m-%d")
            uuid = uuid4().hex[:6]
            local_path = Path.cwd() / "data" / f"{today}-{uuid}"
        elif isinstance(local_path, str):
            local_path = Path(local_path)

        if len(granules):
            files = self._get(granules, local_path, provider, threads)
            return files
        else:
            raise ValueError("List of URLs or DataGranule instances expected")

 @singledispatchmethod
 def _get(
        self,
        granules: Union[List[DataGranule], List[str]],
        local_path: Path,
        provider: Optional[str] = None,
        threads: int = 8,
    ) -> List[str]:

        raise NotImplementedError(f"Cannot _get {granules}")

@mfisher87 @chuckwondo I had been looking into this and it seems like pqdm isn't used in the get and _get functions which are used in download function. But as chuck mentioned in #36 the 'pretty progress bars" are a result of pqdm which I too had when I tried to recreate the error as described in the linked post above.

chuckwondo commented 2 months ago

@Sherwin-14, just search the code base for pqdm.

Whichever functions you find it in, do the following:

  1. Add a fail_fast: bool = True parameter to the end of the parameter list of the function that is calling pqdm
  2. Immediately before the call to pqdm, add the following line:
    exception_behavior = "immediate" if fail_fast else "deferred"
  3. In the call to pqdm, add the following to the end of the args passed to pqdm: exception_behavior=exception_behavior

Then, you have to find the places where those functions are called (i.e., where the functions that call pqdm are called), and do the following:

  1. add a fail_fast param, just like described above
  2. pass the fail_fast arg to the calls to the functions modified above
  3. repeat this process up to the point where the call chain is initiated from any of the "download" or "open" functions

Also, in each modified function, update the docstring appropriately to describe the added fail_fast parameter.

Writing tests for these changes will likely be tricky, so you can ignore this for the moment and focus first on the steps above and make sure that existing tests pass (i.e., make sure you didn't break existing stuff before worrying about writing tests for the new stuff).