leftfield-geospatial / geedim

Search, composite, and download Google Earth Engine imagery.
https://geedim.readthedocs.io
Apache License 2.0
79 stars 11 forks source link

Slow Download Speed and Frequent Retry Errors in geedim #26

Closed mht2953658596 closed 2 weeks ago

mht2953658596 commented 2 weeks ago

My colleagues and I have been using the geedim library to download data, and we've encountered some issues that didn't exist before. It seems like the library has recently faced some restrictions:

Slow Download Speed: The data download process has become extremely slow, which wasn't an issue previously. Frequent Retry Errors: We are now seeing messages like "Tile downloaded failed, retry 3 of 5" very frequently. This is a new problem that didn't occur before. It appears that the library might be facing some recent limitations. Has anything changed recently that could be causing these issues? Are there any solutions or workarounds available?

Thank you for your assistance!

image We have confirmed that the issue is not related to our network

屏幕截图 2024-08-28 223424

mht2953658596 commented 2 weeks ago
def calculate_spm_by_gee(m: str, shp_path: str, output_dir: str):
    roi = ee.FeatureCollection(shapefile.Reader(shp_path).__geo_interface__)

    def rmCloud(image):
        cloudShadowBitMask = (1 << 4)
        cloudBitMask = (1 << 3)
        qa = image.select("QA_PIXEL")
        mask = (qa.bitwiseAnd(cloudShadowBitMask).eq(0)
                .And(qa.bitwiseAnd(cloudBitMask).eq(0)))
        return image.updateMask(mask)

    def apply_scale_factors57(image):
        optical_bands = image.select('SR_B.').multiply(0.0000275).add(-0.2)
        thermal_bands = image.select('ST_B6').multiply(0.00341802).add(149.0)
        return image.addBands(optical_bands, None, True).addBands(thermal_bands, None, True)

    def apply_scale_factors89(image):
        optical_bands = image.select('SR_B.').multiply(0.0000275).add(-0.2)
        thermal_bands = image.select('ST_B.*').multiply(0.00341802).add(149.0)
        return image.addBands(optical_bands, None, True).addBands(thermal_bands, None, True)

    def addNDWI57(image):
        ndwi = image.normalizedDifference(['SR_B2', 'SR_B4']).rename('addNDWI57')
        return ndwi

    def addNDWI89(image):
        ndwi = image.normalizedDifference(['SR_B3', 'SR_B5']).rename('addNDWI89')
        return ndwi

    if int(m) < 2012:
        l5 = (ee.ImageCollection("LANDSAT/LT05/C02/T1_L2")
              .filterDate(f"{m}-01-01", f"{m}-12-31")
              .filterBounds(roi)
              .filter(ee.Filter.lte('CLOUD_COVER', 30))
              .map(rmCloud)
              .map(apply_scale_factors57)
              .mean()
              .select(['SR_B1', 'SR_B2', 'SR_B3', 'SR_B4']))

        ndwi = addNDWI57(l5)
        water = ndwi.gt(-0.005).updateMask(ndwi.lt(10))
        water = water.updateMask(water.gt(0.5))

        l5elicloud = ee.ImageCollection("LANDSAT/LT05/C02/T1_L2").map(rmCloud).map(apply_scale_factors57)
        mean_ecloud = l5elicloud.mean().clip(roi)
        SPM = mean_ecloud.expression(
            '0.0430 + 45 * red', {
                'red': mean_ecloud.select('SR_B3'),
            }).float()
        mean_ecloud = mean_ecloud.addBands(SPM.rename("SPM"))
        SPM = ee.Image(water).multiply(SPM).toFloat()

    elif int(m) == 2012:
        l7 = (ee.ImageCollection("LANDSAT/LE07/C02/T1_L2")
              .filterDate(f"{m}-01-01", f"{m}-12-31")
              .filterBounds(roi)
              .filter(ee.Filter.lte('CLOUD_COVER', 30))
              .map(rmCloud)
              .map(apply_scale_factors57)
              .mean()
              .select(['SR_B1', 'SR_B2', 'SR_B3', 'SR_B4']))

        ndwi = addNDWI57(l7)
        water = ndwi.gt(-0.005).updateMask(ndwi.lt(10))
        water = water.updateMask(water.gt(0.5))

        l7elicloud = ee.ImageCollection("LANDSAT/LE07/C02/T1_L2").map(rmCloud).map(apply_scale_factors57)
        mean_ecloud = l7elicloud.mean().clip(roi)
        SPM = mean_ecloud.expression(
            '0.0430 + 45 * red', {
                'red': mean_ecloud.select('SR_B3'),
            }).float()
        mean_ecloud = mean_ecloud.addBands(SPM.rename("SPM"))
        SPM = ee.Image(water).multiply(SPM).toFloat()

    elif int(m) > 2012:
        l8 = (ee.ImageCollection("LANDSAT/LC08/C02/T1_L2")
              .filterDate(f"{m}-01-01", f"{m}-12-31")
              .filterBounds(roi)
              .filter(ee.Filter.lte('CLOUD_COVER', 30))
              .map(rmCloud)
              .map(apply_scale_factors89)
              .mean()
              .select(['SR_B2', 'SR_B3', 'SR_B4', 'SR_B5']))

        ndwi = addNDWI89(l8)
        water = ndwi.gt(-0.005).updateMask(ndwi.lt(10))
        water = water.updateMask(water.gt(0.5))

        l8elicloud = ee.ImageCollection("LANDSAT/LC08/C02/T1_L2").map(rmCloud).map(apply_scale_factors89)
        mean_ecloud = l8elicloud.mean().clip(roi)
        SPM = mean_ecloud.expression(
            '0.0430 + 45 * red', {
                'red': mean_ecloud.select('SR_B4'),
            }).float()
        mean_ecloud = mean_ecloud.addBands(SPM.rename("SPM"))
        SPM = ee.Image(water).multiply(SPM).toFloat()

    imagename = "SD_" + m + "_SPM" + ".tif"
    filename = os.path.join(output_dir, imagename)
    geemap.download_ee_image(SPM, filename, scale=30, crs='EPSG:4326', max_tile_size=4, region=roi.geometry())
dugalh commented 2 weeks ago

Hello. I was able to download an "SPM" image using your code without any retries or other problems. I used a 1 degree by 2 degree region. Other download tests I've tried are working too. I suspect the problem is related to your network. It is more network reliability than bandwidth that would cause problems like these.

mht2953658596 commented 2 weeks ago

@dugalh Unfortunately, I've tried several network nodes, but still exist this error

mht2953658596 commented 2 weeks ago

Tile downloaded failed, retry 1 of 5. URL: https://earthengine.googleapis.com/v1/projects/earthengine-legacy/thumbnails/3acc5e37e192cb8acaa70a205468c1b7-2e4be4e7305ebf413dddfc4d1d9a4ff2:getPixels. HTTPSConnectionPool(host='earthengine.googleapis.com', port=443): Max retries exceeded with url: /v1/projects/earthengine-legacy/thumbnails/3acc5e37e192cb8acaa70a205468c1b7-2e4be4e7305ebf413dddfc4d1d9a4ff2:getPixels (Caused by ResponseError('too many 429 error responses')). Tile downloaded failed, retry 1 of 5. URL: https://earthengine.googleapis.com/v1/projects/earthengine-legacy/thumbnails/996d9f21e7e83782a547e556c0f6026a-6706e3f01f23578841807c2f603f0678:getPixels. HTTPSConnectionPool(host='earthengine.googleapis.com', port=443): Max retries exceeded with url: /v1/projects/earthengine-legacy/thumbnails/996d9f21e7e83782a547e556c0f6026a-6706e3f01f23578841807c2f603f0678:getPixels (Caused by ResponseError('too many 429 error responses')). Tile downloaded failed, retry 1 of 5. URL: https://earthengine.googleapis.com/v1/projects/earthengine-legacy/thumbnails/65238b7d44503cdd8ef1c16d9b7993b9-3e961b848e6ee9d0d3e30bd483dd2b25:getPixels. HTTPSConnectionPool(host='earthengine.googleapis.com', port=443): Max retries exceeded with url: /v1/projects/earthengine-legacy/thumbnails/65238b7d44503cdd8ef1c16d9b7993b9-3e961b848e6ee9d0d3e30bd483dd2b25:getPixels (Caused by ProxyError('Unable to connect to proxy', RemoteDisconnected('Remote end closed connection without response'))). Tile downloaded failed, retry 1 of 5. URL: https://earthengine.googleapis.com/v1/projects/earthengine-legacy/thumbnails/c4f685bde1711f1d1e22658123e4065c-589989ef03e10d8c803fc6fd1383ebd1:getPixels. HTTPSConnectionPool(host='earthengine.googleapis.com', port=443): Max retries exceeded with url: /v1/projects/earthengine-legacy/thumbnails/c4f685bde1711f1d1e22658123e4065c-589989ef03e10d8c803fc6fd1383ebd1:getPixels (Caused by ResponseError('too many 429 error responses')). Tile downloaded failed, retry 1 of 5. URL: https://earthengine.googleapis.com/v1/projects/earthengine-legacy/thumbnails/89f44a33dbf0f71d328ac74def00843b-52aef704ab22de428c042b6c1ee3c81a:getPixels. HTTPSConnectionPool(host='earthengine.googleapis.com', port=443): Max retries exceeded with url: /v1/projects/earthengine-legacy/thumbnails/89f44a33dbf0f71d328ac74def00843b-52aef704ab22de428c042b6c1ee3c81a:getPixels (Caused by ProxyError('Unable to connect to proxy', RemoteDisconnected('Remote end closed connection without response'))). SD_2022_SPM.tif: |█████▊ | 47.1M/81.8M (raw) [ 57.5%] in 07:10 (eta: 05:17)

Traceback (most recent call last):
  File "D:\micromamba\envs\geemamba\lib\site-packages\ee\data.py", line 406, in _execute_cloud_call
    return call.execute(num_retries=num_retries)
  File "D:\micromamba\envs\geemamba\lib\site-packages\googleapiclient\_helpers.py", line 130, in positional_wrapper
    return wrapped(*args, **kwargs)
  File "D:\micromamba\envs\geemamba\lib\site-packages\googleapiclient\http.py", line 938, in execute
    raise HttpError(resp, content, uri=self.uri)
googleapiclient.errors.HttpError: <HttpError 429 when requesting https://earthengine.googleapis.com/v1/projects/earthengine-legacy/thumbnails?fields=name&alt=json returned "Too Many Requests: Request was rejected because the request rate or concurrency limit was exceeded.". Details: "Too Many Requests: Request was rejected because the request rate or concurrency limit was exceeded.">

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "D:\code_space\shendong-gee\celery_worker\gee_util\calculate_spm.py", line 141, in <module>
    calculate_spm_by_gee("2022", shp_path, output_dir)
  File "D:\code_space\shendong-gee\celery_worker\gee_util\calculate_spm.py", line 128, in calculate_spm_by_gee
    geemap.download_ee_image(SPM, filename, scale=30, crs='EPSG:4326', max_tile_size=2, region=roi.geometry())
  File "D:\micromamba\envs\geemamba\lib\site-packages\geemap\common.py", line 12974, in download_ee_image
    img.download(filename, overwrite=overwrite, num_threads=num_threads, **kwargs)
  File "D:\micromamba\envs\geemamba\lib\site-packages\geedim\download.py", line 1004, in download
    raise ex
  File "D:\micromamba\envs\geemamba\lib\site-packages\geedim\download.py", line 1000, in download
    future.result()
  File "D:\micromamba\envs\geemamba\lib\concurrent\futures\_base.py", line 451, in result
    return self.__get_result()
  File "D:\micromamba\envs\geemamba\lib\concurrent\futures\_base.py", line 403, in __get_result
    raise self._exception
  File "D:\micromamba\envs\geemamba\lib\concurrent\futures\thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
  File "D:\micromamba\envs\geemamba\lib\site-packages\geedim\download.py", line 990, in download_tile
    tile_array = tile.download(session=session, bar=bar)
  File "D:\micromamba\envs\geemamba\lib\site-packages\geedim\tile.py", line 165, in download
    url = self._exp_image.ee_image.getDownloadURL(
  File "D:\micromamba\envs\geemamba\lib\site-packages\ee\image.py", line 558, in getDownloadURL
    return data.makeDownloadUrl(data.getDownloadId(request))
  File "D:\micromamba\envs\geemamba\lib\site-packages\ee\data.py", line 1378, in getDownloadId
    result = _execute_cloud_call(
  File "D:\micromamba\envs\geemamba\lib\site-packages\ee\data.py", line 408, in _execute_cloud_call
    raise _translate_cloud_exception(e)  # pylint: disable=raise-missing-from
ee.ee_exception.EEException: Too Many Requests: Request was rejected because the request rate or concurrency limit was exceeded.
dugalh commented 2 weeks ago

You can try setting the num_threads argument to a small value (e.g. 4) to reduce the number of concurrent requests. The default value is based on the number of CPU cores, with a maximum of 32.

Also see the notes here if you are not using a cloud project: https://developers.google.com/earth-engine/guides/usage#legacy_quota_decrease. The current Earth Engine concurrent request limit for legacy projects (20) is less than the geedim maximum (32), so could be the cause of your problem.

mht2953658596 commented 2 weeks ago

@dugalh Thank you for your reply. First, I removed the toFloat() in the line SPM = ee.Image(water).multiply(SPM).toFloat() in the code, and after that, the download process started. I'm not sure why, but previously it was stuck at 0% the entire time. I also followed your suggestion to reduce the number of concurrent threads and specified the project during the initialization. The issue has improved; although the download is slower, at least it's now working.