nsidc / NSIDC-Data-Access-Notebook

A Jupyter notebook exploring data coverage, size, and customization service availability along with direct data download utilizing the NSIDC DAAC's access and service API.
57 stars 32 forks source link

Download seems to be truncated #7

Closed lefsky closed 8 months ago

lefsky commented 1 year ago

I am trying to download a year's worth of the 8-day MODIS10A data product (500-m Snow Cover) via this notebook. According to the Earthdata interface, I need 13367 granules, and the number of zip files with 2000 images each is calculated by the script to be 7 and 7 zip files are processed. However, when I look at the number of files produced, it's 4000. Is this a known problem?

egreckase commented 1 year ago

Thank you for reaching out; I am happy to troubleshoot your issue. Can you provide your order number(s)? What time frame are you interested in? Would it be possible to break up the order into smaller intervals of time?
Thank you, Gail

lefsky commented 1 year ago

Gail,

I can break up the request into smaller chunks but before I do I'd like to get your feedback on what I'm trying to do.

I want to get the MODIS 8-day MOD10A2 product ingested into Earth Engine for further analysis. I plan to open up the dataset to anyone who wants to access it. GEE has the daily data and although the calculation of the 8-day flag for snow presence/absence is straightforward, the assignment of the other cover codes is not and I haven't seen a reference to how it was done. I contacted Dr. Riggs about this but have not heard back from him.

I'm prepared to do format conversion and mosaicing myself if necessary., but if there were any shortcuts to downloading individual granules and mosaicing them, that would be great. What I could really use is global mosaics of the data- do these exist anywhere? Also, if I could take a look at the code used to convert the daily to 8-day data, that would save me the time and trouble of all these downloads.

Any suggestion will be much appreciated.

Michael

On Tue, Feb 21, 2023 at 2:46 PM Gail Reckase @.***> wrote:

Thank you for reaching out; I am happy to troubleshoot your issue. Can you provide your order number(s)? What time frame are you interested in? Would it be possible to break up the order into smaller intervals of time? Thank you, Gail

— Reply to this email directly, view it on GitHub https://github.com/nsidc/NSIDC-Data-Access-Notebook/issues/7#issuecomment-1438875321, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADCFRZAX56EYUROUQO7HR7DWYT5RHANCNFSM6AAAAAAVCIFWL4 . You are receiving this because you authored the thread.Message ID: @.***>

-- Michael Lefsky (He/His) Home Location: HVHF+GH http://www.researcherid.com/rid/A-7224-2009

“for being prematurely, and worse, intuitively right — there’s a heavy price. But for being wrong — no, not so long as you’re wrong in a pack." Gary Brecher / Portis

*I acknowledge that I live and work on stolen land. This is the land of the Cheyenne, Arapaho, Ute, and Ocheithi Sakowin people. To learn more about these nations, please visit; http://www.utemountainutetribe.com/ http://www.cheyennenation.com/ https://cheyenneandarapaho-nsn.gov/ https://native-land.ca/

egreckase commented 1 year ago

Dear Michael,

Your GEE project sounds very interesting! Hold off breaking up your request into smaller pieces at the moment - we are troubleshooting the notebook. I will update you with more information when we have a potential solution for you.

Regarding mosaicking of the data, if you aren't already, you might try converting to GeoTIFF format first. I am not aware of a global mosaic of the data.

Kindly, Gail

lefsky commented 1 year ago

I spoke with an engineer at earth engine and they ingest individual Modis time in GeoTiff is there a location to download these where I can get a list of files with wget's mirroring function and then pull the files using lftp?

On Thu, Feb 23, 2023, 21:04 Gail Reckase @.***> wrote:

Dear Michael,

Your GEE project sounds very interesting! Hold off breaking up your request into smaller pieces at the moment - we are troubleshooting the notebook. I will update you with more information when we have a potential solution for you.

Regarding mosaicking of the data, if you aren't already, you might try converting to GeoTIFF format first. I am not aware of a global mosaic of the data.

Kindly, Gail

— Reply to this email directly, view it on GitHub https://github.com/nsidc/NSIDC-Data-Access-Notebook/issues/7#issuecomment-1442597554, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADCFRZFRMYNBHTQH2IVJZYLWY73IRANCNFSM6AAAAAAVCIFWL4 . You are receiving this because you authored the thread.Message ID: @.***>

asteiker commented 1 year ago

@lefsky , Thanks for reporting this issue. I just merged a fix for this case where a large (>2000 granule) request is only downloading the first "chunk". It should now download all orders as expected. You can also specify GeoTIFF output using this notebook (and the underlying API service) but perhaps @egreckase could speak more to the wget options - I"m not as familiar with that. Please let us know if you're still experiencing any issues with the notebook.

egreckase commented 1 year ago

@lefsky You can use the wget commands to download MOD10A* granules in their native format (hdf). This page contains more information about programmatic access: https://nsidc.org/data/user-resources/help-center/programmatic-data-access-guide. To customize the data (e.g., convert to GeoTIFF) first, you can use NASA Earthdata Search https://search.earthdata.nasa.gov/ or as @asteiker suggests above, specifying the GeoTIFF output using the data access notebook.

lefsky commented 1 year ago

The notebook seems to be working fine now- thanks for all your help

On Tue, Feb 28, 2023 at 7:31 PM Gail Reckase @.***> wrote:

@lefsky https://github.com/lefsky You can use the wget commands to download MOD10A* granules in their native format (hdf). This page contains more information about programmatic access: https://nsidc.org/data/user-resources/help-center/programmatic-data-access-guide. To customize the data (e.g., convert to GeoTIFF) first, you can use NASA Earthdata Search https://search.earthdata.nasa.gov/ or as @asteiker https://github.com/asteiker suggests above, specifying the GeoTIFF output using the data access notebook.

— Reply to this email directly, view it on GitHub https://github.com/nsidc/NSIDC-Data-Access-Notebook/issues/7#issuecomment-1449027994, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADCFRZH2OQ62LYYZJKEQQBLWZZ4FZANCNFSM6AAAAAAVCIFWL4 . You are receiving this because you were mentioned.Message ID: @.***>

-- Michael Lefsky (He/His) Home Location: HVHF+GH http://www.researcherid.com/rid/A-7224-2009

“for being prematurely, and worse, intuitively right — there’s a heavy price. But for being wrong — no, not so long as you’re wrong in a pack." Gary Brecher / Portis

*I acknowledge that I live and work on stolen land. This is the land of the Cheyenne, Arapaho, Ute, and Ocheithi Sakowin people. To learn more about these nations, please visit; http://www.utemountainutetribe.com/ http://www.cheyennenation.com/ https://cheyenneandarapaho-nsn.gov/ https://native-land.ca/

lefsky commented 1 year ago

I had the script running for a while but last night and this morning, I am getting an error message on the following line:

capability_url = f'https://n5eil02u.ecs.nsidc.org/egi/capabilities/ {short_name}.{latest_version}.xml'

The error text follows. Not sure if this is a problem on your end or mine but thought I'd send this not to see if you know:

https://n5eil02u.ecs.nsidc.org/egi/capabilities/MOD10A2.61.xml


SSLCertVerificationError Traceback (most recent call last)

/usr/local/lib/python3.8/dist-packages/urllib3/connectionpool.py https://localhost:8080/# in urlopen(self, method, url, body, headers, retries, redirect, assert_same_host, timeout, pool_timeout, release_conn, chunked, body_pos, **response_kw) 702 # Make the request on the httplib connection object. --> 703 httplib_response = self._make_request( 704 conn,


15 frames

/usr/local/lib/python3.8/dist-packages/urllib3/connectionpool.py https://localhost:8080/# in _make_request(self, conn, method, url, timeout, chunked, **httplib_request_kw) 385 try: --> 386 self._validate_conn(conn) 387 except (SocketTimeout, BaseSSLError) as e:

/usr/local/lib/python3.8/dist-packages/urllib3/connectionpool.py https://localhost:8080/# in _validate_conn(self, conn) 1041 if not getattr(conn, "sock", None): # AppEngine might not have .sock -> 1042 conn.connect() 1043

/usr/local/lib/python3.8/dist-packages/urllib3/connection.py https://localhost:8080/# in connect(self) 413 --> 414 self.sock = ssl_wrap_socket( 415 sock=conn,

/usr/local/lib/python3.8/dist-packages/urllib3/util/ssl_.py https://localhost:8080/# in ssl_wrap_socket(sock, keyfile, certfile, cert_reqs, ca_certs, server_hostname, ssl_version, ciphers, ssl_context, ca_cert_dir, key_password, ca_cert_data, tls_in_tls) 448 if send_sni: --> 449 ssl_sock = _ssl_wrap_socket_impl( 450 sock, context, tls_in_tls, server_hostname=server_hostname

/usr/local/lib/python3.8/dist-packages/urllib3/util/ssl_.py https://localhost:8080/# in _ssl_wrap_socket_impl(sock, ssl_context, tls_in_tls, server_hostname) 492 if server_hostname: --> 493 return ssl_context.wrap_socket(sock, server_hostname=server_hostname) 494 else:

/usr/lib/python3.8/ssl.py https://localhost:8080/# in wrap_socket(self, sock, server_side, do_handshake_on_connect, suppress_ragged_eofs, server_hostname, session) 499 # ctx._wrap_socket() --> 500 return self.sslsocket_class._create( 501 sock=sock,

/usr/lib/python3.8/ssl.py https://localhost:8080/# in _create(cls, sock, server_side, do_handshake_on_connect, suppress_ragged_eofs, server_hostname, context, session) 1039 raise ValueError("do_handshake_on_connect should not be specified for non-blocking sockets") -> 1040 self.do_handshake() 1041 except (OSError, ValueError):

/usr/lib/python3.8/ssl.py https://localhost:8080/# in do_handshake(self, block) 1308 self.settimeout(None) -> 1309 self._sslobj.do_handshake() 1310 finally:

SSLCertVerificationError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: certificate has expired (_ssl.c:1131)

During handling of the above exception, another exception occurred:

MaxRetryError Traceback (most recent call last)

/usr/local/lib/python3.8/dist-packages/requests/adapters.py https://localhost:8080/# in send(self, request, stream, timeout, verify, cert, proxies) 438 if not chunked: --> 439 resp = conn.urlopen( 440 method=request.method,

/usr/local/lib/python3.8/dist-packages/urllib3/connectionpool.py https://localhost:8080/# in urlopen(self, method, url, body, headers, retries, redirect, assert_same_host, timeout, pool_timeout, release_conn, chunked, body_pos, **response_kw) 786 --> 787 retries = retries.increment( 788 method, url, error=e, _pool=self, _stacktrace=sys.exc_info()[2]

/usr/local/lib/python3.8/dist-packages/urllib3/util/retry.py https://localhost:8080/# in increment(self, method, url, response, error, _pool, _stacktrace) 591 if new_retry.is_exhausted(): --> 592 raise MaxRetryError(_pool, url, error or ResponseError(cause)) 593

MaxRetryError: HTTPSConnectionPool(host='n5eil02u.ecs.nsidc.org', port=443): Max retries exceeded with url: /egi/capabilities/MOD10A2.61.xml (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: certificate has expired (_ssl.c:1131)')))

During handling of the above exception, another exception occurred:

SSLError Traceback (most recent call last)

in 8 9 session = requests.session() ---> 10 s = session.get(capability_url) 11 response = session.get(s.url,auth=(uid,pswd)) 12 /usr/local/lib/python3.8/dist-packages/requests/sessions.py in get(self, url, **kwargs) 553 554 kwargs.setdefault('allow_redirects', True) --> 555 return self.request('GET', url, **kwargs) 556 557 def options(self, url, **kwargs): /usr/local/lib/python3.8/dist-packages/requests/sessions.py in request(self, method, url, params, data, headers, cookies, files, auth, timeout, allow_redirects, proxies, hooks, stream, verify, cert, json) 540 } 541 send_kwargs.update(settings) --> 542 resp = self.send(prep, **send_kwargs) 543 544 return resp /usr/local/lib/python3.8/dist-packages/requests/sessions.py in send(self, request, **kwargs) 653 654 # Send the request --> 655 r = adapter.send(request, **kwargs) 656 657 # Total elapsed time of the request (approximately) /usr/local/lib/python3.8/dist-packages/requests/adapters.py in send(self, request, stream, timeout, verify, cert, proxies) 512 if isinstance(e.reason, _SSLError): 513 # This branch is for urllib3 v1.22 and later. --> 514 raise SSLError(e, request=request) 515 516 raise ConnectionError(e, request=request) SSLError: HTTPSConnectionPool(host='n5eil02u.ecs.nsidc.org', port=443): Max retries exceeded with url: /egi/capabilities/MOD10A2.61.xml (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: certificate has expired (_ssl.c:1131)')) https://n5eil02u.ecs.nsidc.org/egi/capabilities/MOD10A2.61.xml --------------------------------------------------------------------------- SSLCertVerificationError Traceback (most recent call last) /usr/local/lib/python3.8/dist-packages/urllib3/connectionpool.py in urlopen(self, method, url, body, headers, retries, redirect, assert_same_host, timeout, pool_timeout, release_conn, chunked, body_pos, **response_kw) 702 # Make the request on the httplib connection object. --> 703 httplib_response = self._make_request( 704 conn, ------------------------------ 15 frames ------------------------------ /usr/local/lib/python3.8/dist-packages/urllib3/connectionpool.py in _make_request(self, conn, method, url, timeout, chunked, **httplib_request_kw) 385 try: --> 386 self._validate_conn(conn) 387 except (SocketTimeout, BaseSSLError) as e: /usr/local/lib/python3.8/dist-packages/urllib3/connectionpool.py in _validate_conn(self, conn) 1041 if not getattr(conn, "sock", None): # AppEngine might not have `.sock` -> 1042 conn.connect() 1043 /usr/local/lib/python3.8/dist-packages/urllib3/connection.py in connect(self) 413 --> 414 self.sock = ssl_wrap_socket( 415 sock=conn, /usr/local/lib/python3.8/dist-packages/urllib3/util/ssl_.py in ssl_wrap_socket(sock, keyfile, certfile, cert_reqs, ca_certs, server_hostname, ssl_version, ciphers, ssl_context, ca_cert_dir, key_password, ca_cert_data, tls_in_tls) 448 if send_sni: --> 449 ssl_sock = _ssl_wrap_socket_impl( 450 sock, context, tls_in_tls, server_hostname=server_hostname /usr/local/lib/python3.8/dist-packages/urllib3/util/ssl_.py in _ssl_wrap_socket_impl(sock, ssl_context, tls_in_tls, server_hostname) 492 if server_hostname: --> 493 return ssl_context.wrap_socket(sock, server_hostname=server_hostname) 494 else: /usr/lib/python3.8/ssl.py in wrap_socket(self, sock, server_side, do_handshake_on_connect, suppress_ragged_eofs, server_hostname, session) 499 # ctx._wrap_socket() --> 500 return self.sslsocket_class._create( 501 sock=sock, /usr/lib/python3.8/ssl.py in _create(cls, sock, server_side, do_handshake_on_connect, suppress_ragged_eofs, server_hostname, context, session) 1039 raise ValueError("do_handshake_on_connect should not be specified for non-blocking sockets") -> 1040 self.do_handshake() 1041 except (OSError, ValueError): /usr/lib/python3.8/ssl.py in do_handshake(self, block) 1308 self.settimeout(None) -> 1309 self._sslobj.do_handshake() 1310 finally: SSLCertVerificationError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: certificate has expired (_ssl.c:1131) During handling of the above exception, another exception occurred: MaxRetryError Traceback (most recent call last) /usr/local/lib/python3.8/dist-packages/requests/adapters.py in send(self, request, stream, timeout, verify, cert, proxies) 438 if not chunked: --> 439 resp = conn.urlopen( 440 method=request.method, /usr/local/lib/python3.8/dist-packages/urllib3/connectionpool.py in urlopen(self, method, url, body, headers, retries, redirect, assert_same_host, timeout, pool_timeout, release_conn, chunked, body_pos, **response_kw) 786 --> 787 retries = retries.increment( 788 method, url, error=e, _pool=self, _stacktrace=sys.exc_info()[2] /usr/local/lib/python3.8/dist-packages/urllib3/util/retry.py in increment(self, method, url, response, error, _pool, _stacktrace) 591 if new_retry.is_exhausted(): --> 592 raise MaxRetryError(_pool, url, error or ResponseError(cause)) 593 MaxRetryError: HTTPSConnectionPool(host='n5eil02u.ecs.nsidc.org', port=443): Max retries exceeded with url: /egi/capabilities/MOD10A2.61.xml (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: certificate has expired (_ssl.c:1131)'))) During handling of the above exception, another exception occurred: SSLError Traceback (most recent call last) in 8 9 session = requests.session() ---> 10 s = session.get(capability_url) 11 response = session.get(s.url,auth=(uid,pswd)) 12 /usr/local/lib/python3.8/dist-packages/requests/sessions.py in get(self, url, **kwargs) 553 554 kwargs.setdefault('allow_redirects', True) --> 555 return self.request('GET', url, **kwargs) 556 557 def options(self, url, **kwargs): /usr/local/lib/python3.8/dist-packages/requests/sessions.py in request(self, method, url, params, data, headers, cookies, files, auth, timeout, allow_redirects, proxies, hooks, stream, verify, cert, json) 540 } 541 send_kwargs.update(settings) --> 542 resp = self.send(prep, **send_kwargs) 543 544 return resp /usr/local/lib/python3.8/dist-packages/requests/sessions.py in send(self, request, **kwargs) 653 654 # Send the request --> 655 r = adapter.send(request, **kwargs) 656 657 # Total elapsed time of the request (approximately) /usr/local/lib/python3.8/dist-packages/requests/adapters.py in send(self, request, stream, timeout, verify, cert, proxies) 512 if isinstance(e.reason, _SSLError): 513 # This branch is for urllib3 v1.22 and later. --> 514 raise SSLError(e, request=request) 515 516 raise ConnectionError(e, request=request) SSLError: HTTPSConnectionPool(host='n5eil02u.ecs.nsidc.org', port=443): Max retries exceeded with url: /egi/capabilities/MOD10A2.61.xml (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: certificate has expired (_ssl.c:1131)')) On Tue, Feb 28, 2023 at 8:03 PM ***@***.*** ***@***.***> wrote: > The notebook seems to be working fine now- thanks for all your help > > On Tue, Feb 28, 2023 at 7:31 PM Gail Reckase ***@***.***> > wrote: > >> @lefsky You can use the wget commands to >> download MOD10A* granules in their native format (hdf). This page contains >> more information about programmatic access: >> https://nsidc.org/data/user-resources/help-center/programmatic-data-access-guide. >> To customize the data (e.g., convert to GeoTIFF) first, you can use NASA >> Earthdata Search https://search.earthdata.nasa.gov/ or as @asteiker >> suggests above, specifying the GeoTIFF >> output using the data access notebook. >> >> — >> Reply to this email directly, view it on GitHub >> , >> or unsubscribe >> >> . >> You are receiving this because you were mentioned.Message ID: >> ***@***.***> >> > > > -- > Michael Lefsky (He/His) > Home Location: HVHF+GH > http://www.researcherid.com/rid/A-7224-2009 > > *“for being prematurely, and worse, intuitively right — there’s a heavy > price. But for being wrong — no, not so long as you’re wrong in a pack." > Gary Brecher / Portis* > > *I acknowledge that I live and work on stolen land. This is the land of > the Cheyenne, Arapaho, Ute, and Ocheithi Sakowin people. To learn more > about these nations, please visit; > http://www.utemountainutetribe.com/ > http://www.cheyennenation.com/ > https://cheyenneandarapaho-nsn.gov/ > https://native-land.ca/ > > -- Michael Lefsky (He/His) Home Location: HVHF+GH http://www.researcherid.com/rid/A-7224-2009 *“for being prematurely, and worse, intuitively right — there’s a heavy price. But for being wrong — no, not so long as you’re wrong in a pack." Gary Brecher / Portis* *I acknowledge that I live and work on stolen land. This is the land of the Cheyenne, Arapaho, Ute, and Ocheithi Sakowin people. To learn more about these nations, please visit; http://www.utemountainutetribe.com/ http://www.cheyennenation.com/ https://cheyenneandarapaho-nsn.gov/ https://native-land.ca/
egreckase commented 1 year ago

Good Morning @lefsky we have a security certificate expire on our end. We are currently working on resolving this issue. I will keep you updated and I apologize for the inconvenience.

asteiker commented 8 months ago

I believe this was an ephemeral issue due to the security certificate. We can reopen if this is still causing any issues, however.