Closed trey-stafford closed 19 hours ago
Looks like I'm getting the HTML response for EDL login, maybe I'm not doing something right with auth?
I think there's an issue with the auth endpoint for these granules? After clicking a data link in my browser and being redirected to EDL, I entered my credentials, and then was redirected to https://lance.nsstc.nasa.gov/urs-redirect
which gave 403
. After doing that, I'm able to go back to the CMR search results and click the data links and see the files.
After logging in once, I get a message like "so and so has been added to your authorized EDL applications". Can you try logging in as the account in question and then clicking the data links in your browser? I hope once the authorization step is done you may have different results.
I have manually downloaded the files with the earthdata account I'm using to authenticate with earthaccess
. The results are the same from earthaccess
's side.
I'm able to download the granules with some code adapted from qgreenland
:
import os
import earthaccess
import requests
_URS_COOKIE = "urs_user_already_logged"
_CHUNK_SIZE = 8 * 1024
def _get_earthdata_creds():
if not os.environ.get("EARTHDATA_USERNAME"):
raise RuntimeError("Environment variable EARTHDATA_USERNAME must be defined.")
if not os.environ.get("EARTHDATA_PASSWORD"):
raise RuntimeError("Environment variable EARTHDATA_PASSWORD must be defined.")
return (
os.environ["EARTHDATA_USERNAME"],
os.environ["EARTHDATA_PASSWORD"],
)
def _create_earthdata_authenticated_session(s=None, *, hosts: list[str], verify):
if not s:
s = requests.session()
for host in hosts:
resp = s.get(
host,
# We only want to inspect the redirect, not follow it yet:
allow_redirects=False,
# We don't want to accidentally fetch any data:
stream=True,
verify=verify,
)
# Copy the headers so they can be used case-insensitively after the
# response is closed.
headers = {k.lower(): v for k, v in resp.headers.items()}
resp.close()
redirected = resp.status_code == 302
redirected_to_urs = (
redirected and "urs.earthdata.nasa.gov" in headers["location"]
)
if not (redirected_to_urs):
print(f"Host {host} did not redirect to URS -- continuing without auth.")
return s
auth_resp = s.get(
headers["location"],
# Don't download data!
stream=True,
auth=_get_earthdata_creds(),
)
resp.close()
if not (auth_resp.ok and s.cookies.get(_URS_COOKIE) == "yes"):
msg = f"Authentication with Earthdata Login failed with:\n{auth_resp.text}"
raise RuntimeError(msg)
print(f"Authenticated for {host} with Earthdata Login.")
return s
def _download_lance_files():
results = earthaccess.search_data(short_name="AU_SI12_NRT_R04")
for granule in results:
# There are two links for each granule. one for lance.nsstc.nasa.gov and
# the other for lance.itsc.uah.edu. The first one is fine.
url = granule.data_links(access="external")[0]
session = _create_earthdata_authenticated_session(hosts=[url], verify=True)
with session.get(
url,
timeout=60,
stream=True,
headers={"User-Agent": "NSIDC-dev-trst2284"},
) as resp:
# e.g., https://lance.nsstc.nasa.gov/.../AMSR_U2_L3_SeaIce12km_P04_20230926.he5
# -> AMSR_U2_L3_SeaIce12km_P04_20230926.he5
fn = url.split("/")[-1]
with open(f"/tmp/test/{fn}", "wb") as f:
for chunk in resp.iter_content(chunk_size=_CHUNK_SIZE):
f.write(chunk)
print(f"wrote {fn}")
if __name__ == "__main__":
_download_lance_files()
@trey-stafford @MattF-NSIDC We're unsure if this issue was resolved by #308 or not. Or potentially this is an issue outside of earthaccess
and more of an issue with the collection's auth endpoint?
@asteiker , no, #308 didn't resolve this. #308 contained some fixups that I found while debugging the problem, but never found a solution.
I'm testing with the latest version of earthaccess right now though, and it seems like it might be working. I'm running into another issue though: these granules have two data links which have the same data. One is pretty fast but the other link is slow to download.
I'll wait until my test is completed to inspect the files and verify the look correct, and then if they do, we can probably close this ticket and open another to address duplicate data links from different mirrors.
E.g., here are the "duplicate" data links for one of the results:
>>> results[-1].data_links()
[
'https://lance.nsstc.nasa.gov/amsr2-science/data/level3/seaice12/R04/hdfeos5/AMSR_U2_L3_SeaIce12km_R04_20241016.he5',
'https://lance.itsc.uah.edu/amsr2-science/data/level3/seaice12/R04/hdfeos5/AMSR_U2_L3_SeaIce12km_R04_20241016.he5'
]
I've confirmed that earthaccess
v0.11.0 now downloads the data!
Just further confirming that earthaccess
is getting and processing multiple links for the same granule:
>>> len(files)
28
>>> len(set(files))
14
The list of downloaded files returned by earthaccess
in my original example given above contains duplicates.
Thanks for re-testing and confirming that this is now downloading. I'll enter a new Issue on the multiple links.
Hey @asteiker, gentle reminder that I no longer check the @MattF-NSIDC account :)
@mfisher87 yes! So sorry I grabbed the wrong handle yesterday. It pops up automatically for me and looks like I made a few mistakes.
I am trying to download granules of AU_SI12_NRT_R04 using
earthaccess.download
but the results are incorrect. Files are created on disk but they do not seem to contain the data.This results in files of ~4.1K in size in the indicated
/tmp/test
directory. I expect files ~126M in size:I haven't dug into this very deeply yet, but I found the code in
earthaccess.store
that is responsible for downloading files and set a breakpoint here: