nsidc / earthaccess

Python Library for NASA Earthdata APIs
https://earthaccess.readthedocs.io/
MIT License
390 stars 78 forks source link

Bad gateway error when trying to access TROPOMI files #594

Closed zfasnacht closed 3 months ago

zfasnacht commented 3 months ago

I'm trying to use the earthaccess tool to read TROPOMI files that are in the GES DISC cloud but I'm getting the following error quite frequently

Traceback (most recent call last):
  File "/panfs/ccds02/home/zfasnach/pace_no2_nn_train.py", line 16, in <module>
    pace_data = grab_pace_data(start_date,end_date)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/panfs/ccds02/home/zfasnach/grab_pace_l1b.py", line 44, in grab_pace_data
    f = h5py.File(filename,'r')  
        ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/zfasnach/.conda/envs/example_tf/lib/python3.11/site-packages/h5py/_hl/files.py", line 562, in __init__
    fid = make_fid(name, mode, userblock_size, fapl, fcpl, swmr=swmr)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/zfasnach/.conda/envs/example_tf/lib/python3.11/site-packages/h5py/_hl/files.py", line 235, in make_fid
    fid = h5f.open(name, flags, fapl=fapl)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "h5py/h5f.pyx", line 102, in h5py.h5f.open
  File "h5py/h5fd.pyx", line 163, in h5py.h5fd.H5FD_fileobj_read
  File "/home/zfasnach/.conda/envs/example_tf/lib/python3.11/site-packages/fsspec/spec.py", line 1915, in readinto
    data = self.read(out.nbytes)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/zfasnach/.conda/envs/example_tf/lib/python3.11/site-packages/fsspec/spec.py", line 1897, in read
    out = self.cache._fetch(self.loc, self.loc + length)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/zfasnach/.conda/envs/example_tf/lib/python3.11/site-packages/fsspec/caching.py", line 481, in _fetch
    self.cache = self.fetcher(start, bend)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/zfasnach/.conda/envs/example_tf/lib/python3.11/site-packages/fsspec/asyn.py", line 118, in wrapper
    return sync(self.loop, func, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/zfasnach/.conda/envs/example_tf/lib/python3.11/site-packages/fsspec/asyn.py", line 103, in sync
    raise return_result
  File "/home/zfasnach/.conda/envs/example_tf/lib/python3.11/site-packages/fsspec/asyn.py", line 56, in _runner
    result[0] = await coro
                ^^^^^^^^^^
  File "/home/zfasnach/.conda/envs/example_tf/lib/python3.11/site-packages/fsspec/implementations/http.py", line 653, in async_fetch_range
    r.raise_for_status()
  File "/home/zfasnach/.conda/envs/example_tf/lib/python3.11/site-packages/aiohttp/client_reqrep.py", line 1060, in raise_for_status
    raise ClientResponseError(
aiohttp.client_exceptions.ClientResponseError: 502, message='Bad Gateway', url=URL('https://data.gesdisc.earthdata.nasa.gov/data/S5P_TROPOMI_Level2/S5P_L2__NO2____HiR.2/2024/143/S5P_OFFL_L2__NO2____20240522T000826_20240522T014956_34229_03_020600_20240523T161152.nc')

Any idea how to avoid this from happening other then a simple try/except?

chuckwondo commented 3 months ago

@zfasnacht, have you accepted the EULA? If not, that might be the problem. To accept the EULA, open the URL mentioned in the error message in a browser, which should redirect you to login to Earthdata Login, and then to an EULA (End User License Agreement) page, where you can check to box at the bottom of the page and click the Agree button. Once you do that, you should get past this problem, assuming you haven't already accepted the EULA, and assuming you're using the same credentials via earthaccess as you do when you accept the EULA.

zfasnacht commented 3 months ago

Well it's not happening for the first file, so I'm not sure that would be the case. It reads a few files, then randomly that error occurs. Might read 2 files ok, might read 7, seems to be random.

zfasnacht commented 3 months ago

I did go to that link, logged in, and it downloaded fine which I think suggests I've already accepted the EULA.

chuckwondo commented 3 months ago

Can you share your code? Just enough to show how you're using earthaccess.

zfasnacht commented 3 months ago

Of course, thanks for the help!

import earthaccess
import h5py

start_date = '2024-05-22 00:00:00'
end_date = '2024-05-22 23:59:59'

def grab_pace_data(start_date,end_date):
    earthaccess.login(persist=True)

    results = earthaccess.search_data(short_name = 'S5P_L2__NO2____HiR',cloud_hosted=True,temporal=(start_date,end_date),count=20,bounding_box=(-180,-90,180,90))                                                 
    trop_no2_files = earthaccess.open(results)                                                                                                                                                                    

    for filename in trop_no2_files:
        print(filename.full_name)                                                                                                                                                                                 
        f = h5py.File(filename,'r')

        data_group = '/PRODUCT/SUPPORT_DATA/DETAILED_RESULTS/'
        product_group = '/PRODUCT/'

        no2_scd = f[data_group+'nitrogendioxide_slant_column_density'][0]
        no2_strat = f[data_group+'nitrogendioxide_stratospheric_column'][0]
chuckwondo commented 3 months ago

This might have to do with the underlying async and multi-threading happening under the covers with the fsspec library. Unfortunately, the way earthaccess.open is currently implemented, this might be causing this issue when using it as you are using it (which is how most people are using it, I suspect).

To see if my hunch is correct, try doing the following instead, and let me know if this avoids the issue.

import earthaccess
import h5py

start_date = '2024-05-22 00:00:00'
end_date = '2024-05-22 23:59:59'

def grab_pace_data(start_date,end_date):
    data_group = '/PRODUCT/SUPPORT_DATA/DETAILED_RESULTS/'
    product_group = '/PRODUCT/'

    earthaccess.login(persist=True)

    results = earthaccess.search_data(short_name = 'S5P_L2__NO2____HiR',cloud_hosted=True,temporal=(start_date,end_date),count=20,bounding_box=(-180,-90,180,90))

    for result in results:
        with (
            earthaccess.open([result])[0] as trop_no2_file,
            h5py.File(trop_no2_file) as f
        ):
            print(trop_no2_file.full_name)
            f = h5py.File(trop_no2_file,'r')

            no2_scd = f[data_group+'nitrogendioxide_slant_column_density'][0]
            no2_strat = f[data_group+'nitrogendioxide_stratospheric_column'][0]

This will cause each file to be open and closed in sequence. The way most people use eartheaccess.open with multiple files, multiple files are opened concurrently across multiple threads, and they are not closed, causing resource leaks. Further, given some potential issues with the combination of fsspec caching, multi-threading, and h5py, opening (and closing) each file in sequence might just address this issue.

Although I wouldn't normally suspect a "Bad Gateway" error to be a result of such potential caching/threading conflicts, I've certainly seen misleading error messages before.

Alternatively, it might literally be a flaky server causing intermittent "Bad Gateway" errors.

Regardless, I still recommend the "safer" file handling approach I gave above. If it doesn't fix this specific problem, it should at least avoid other potentially gnarly behavior.

zfasnacht commented 3 months ago

Oh geez, that's a great point. I actually am normally careful about closing files but it looks like I did miss that so you might be very right. I'll give that a try.

Thanks for the help!

zfasnacht commented 3 months ago

So I'm making sure I close the file now but it still seems like after I read 1-2 files I get the Bad Gateway error

Any other possible suggestions to improve this?

mfisher87 commented 3 months ago

It's a different file every time, right? You mentioned that this is random. Perhaps a retry mechanism which "backs off" a little bit by waiting an increasing number of seconds (to a limit) with each retry would help work around this. It's possible this explanation from @chuckwondo is the issue:

it might literally be a flaky server causing intermittent "Bad Gateway" errors.

GES DISC may appreciate a heads up about this or be better able to help troubleshoot.

zfasnacht commented 3 months ago

I'll give the retries a test. Problem is that it's happening so frequently that I'm not sure how much that will help. It seemed today like I actually went for 30mins to an hour without being able to access a single file.

I sent a message to the contact info for earthdata but as you suggest I'll also reach out to the GES DISC folks.

Thanks again for all the help!

mfisher87 commented 3 months ago

We're happy to help any time! I'm going to close this issue since we have a new issue to track the need for us to implement retries internally, but if you feel there's more to talk about or that the issue should be re-opened, please feel free to continue to post here.

goodwilj commented 3 months ago

@zfasnacht I was also having this issue downloading large amounts of TEMPO data (though this data is probably held on different servers than TROPOMI data), and I came across this issue. The 502 Bad Gateway error would occur randomly with or without using the earthaccess API (e.g. with curl also), so it seems to be a server issue. I reduced the number of threads in the earthaccess.download() function to potentially help any overloading. The 502 Bad Gateway errors still persisted but were less frequent.

mfisher87 commented 3 months ago

It's clear there's more to discuss here! I'm going to re-open and convert this to a discussion.