rasterio / rasterio

Rasterio reads and writes geospatial raster datasets
https://rasterio.readthedocs.io/
Other
2.18k stars 523 forks source link

memory doesn't free using features.geometry_mask function with fastapi #3082

Open aloi214 opened 2 months ago

aloi214 commented 2 months ago

Expected behavior and actual behavior.

I'm using geometry_mask function in fastapi, service's memory should be released after each request, no memory increase in iteration.

Actual behavior

Memory is not released, consumed memory increases with every request.

Steps to reproduce the problem.

from rasterio import features, transform
import numpy as np
import shapely.geometry as sg
from fastapi import FastAPI
app = FastAPI()

@app.post("/test")
async def read_mask():
    a1 = [[-35.0, 11.0], [31.0, 36.0], [63.0, -13.0], [18.0, -36.0], [-49.0, -33.0], [-52.0, 4.0], [-35.0, 11.0]]
    a2 = [[-37.0, 33.0], [13.0, 11.0], [60.0, 36.0], [68.0, -31.0], [-64.0, -3.0], [-48.0, 27.0], [-37.0, 33.0]]
    a3 = [[-37.0, 33.0], [-14.0, 38.0], [-3.0, 30.0], [-7.0, -9.0], [-64.0, -3.0], [-48.0, 27.0], [-37.0, 33.0]]
    a4 = [[66.0, -26.0], [112.0, -11.0], [124.0, -41.0], [89.0, -63.0], [35.0, -74.0], [21.0, -20.0], [66.0, -26.0]]
    a5 = [[73.0, 51.0], [119.0, 39.0], [133.0, -44.0], [103.0, -102.0], [27.0, -82.0], [16.0, 52.0], [73.0, 51.0]]
    ss = [a1, a2, a3, a4, a5]
    k = []
    for i in range(5):
        polyline = ss[i%5]
        k.append(create_mask(polyline))

def create_mask(polyline):
    polyline = np.array(polyline)
    origin = polyline.min(axis=0)
    site_shape = (polyline.max(axis=0) - polyline.min(axis=0))[::-1].astype(int)
    shapely_geo = sg.Polygon(polyline-origin)
    control_mask = features.geometry_mask([shapely_geo], out_shape=site_shape, transform=transform.IDENTITY, invert=False)
    return 0

Using top command can monitor the occupied memory will increase in iterations. If I replace create_mask by a string, the occupied memory will keep same.

@app.post("/testcache")
async def read_mask():
    a1 = [[-35.0, 11.0], [31.0, 36.0], [63.0, -13.0], [18.0, -36.0], [-49.0, -33.0], [-52.0, 4.0], [-35.0, 11.0]]
    a2 = [[-37.0, 33.0], [13.0, 11.0], [60.0, 36.0], [68.0, -31.0], [-64.0, -3.0], [-48.0, 27.0], [-37.0, 33.0]]
    a3 = [[-37.0, 33.0], [-14.0, 38.0], [-3.0, 30.0], [-7.0, -9.0], [-64.0, -3.0], [-48.0, 27.0], [-37.0, 33.0]]
    a4 = [[66.0, -26.0], [112.0, -11.0], [124.0, -41.0], [89.0, -63.0], [35.0, -74.0], [21.0, -20.0], [66.0, -26.0]]
    a5 = [[73.0, 51.0], [119.0, 39.0], [133.0, -44.0], [103.0, -102.0], [27.0, -82.0], [16.0, 52.0], [73.0, 51.0]]
    ss = [a1, a2, a3, a4, a5]
    k = []
    for i in range(10):
        k.append('gab' * 9999999)

Environment Information

rasterio info:
  rasterio: 1.3.10
      GDAL: 3.8.4
      PROJ: 9.3.1
      GEOS: 3.11.1
 PROJ DATA: /opt/conda/lib/python3.9/site-packages/rasterio/proj_data
 GDAL DATA: /opt/conda/lib/python3.9/site-packages/rasterio/gdal_data

System:
    python: 3.9.12 (main, Jun  1 2022, 11:38:51)  [GCC 7.5.0]
executable: /opt/conda/bin/python
   machine: Linux-4.15.0-151-generic-x86_64-with-glibc2.31

Python deps:
    affine: 2.4.0
     attrs: 23.2.0
   certifi: 2023.11.17
     click: 8.1.7
     cligj: 0.7.2
    cython: 3.0.8
     numpy: 1.24.4
    snuggs: 1.4.7
click-plugins: None
setuptools: 68.2.2
sgillies commented 2 months ago

@aloi214 Thanks for the report! I have a few questions:

aloi214 commented 2 months ago

@sgillies Thanks for replying. As I know, usually the occupied memory after finishing each request should keep same after first request. And I run more test:

  1. loop over create_mask() without append result into a list.
  2. limit GDAL_CACHEMAX as
    def create_mask(polyline):
    polyline = np.array(polyline)
    origin = polyline.min(axis=0)
    site_shape = (polyline.max(axis=0) - polyline.min(axis=0))[::-1].astype(int)
    shapely_geo = sg.Polygon(polyline-origin)
    with rasterio.Env(GDAL_CACHEMAX=0, CPL_VSIL_CURL_CACHE_SIZE=0, VSI_CACHE=False, VSI_CACHE_SIZE=0, aws_unsigned=True) as env:
        control_mask = features.geometry_mask([shapely_geo], out_shape=site_shape, transform=transform.IDENTITY, invert=False)
    return control_mask
  3. Excute export GDAL_CACHEMAX=0 in command line to limit GDAL_CACHEMAX.

But the memory are keeping increase in all those cases. I'm not sure if I limit the GDAL_CACHEMAX successfully.