ydoherty / CoastSat.PlanetScope

Batch shoreline extraction toolkit for PlanetScope Dove satellite imagery
GNU General Public License v3.0
48 stars 13 forks source link

0 scenes passed cloud and area criteria / general masking issue #14

Open collinjroland opened 6 months ago

collinjroland commented 6 months ago

Description

When running cell 1.3) Pre-Processing - image coregistration and scene merging, I consistently receive 0 out of N scenes passed cloud and area criteria due to a failure to exceed the AOI area threshold, regardless of how low I set the AOI area threshold.

Details

I attempted to debug the error and ended up at the function get_file_extent. pre_process -> image_merge -> map_merge -> filter_aoi_cloud -> get_file_extent

def get_file_extent(nan_path):
    nan_mask = load_udm(nan_path)
    aoi_area = nan_mask.shape[1]*nan_mask.shape[0]
    nan_area = np.sum(nan_mask)
    image_extent = 100-(nan_area/aoi_area)*100

    return image_extent

The function calculates the aoi_area as the total number of pixels in the nan_mask array, and the nan_area as the sum of the cell values in the nan_mask array. I think the nan values are supposed to be represented by a numeric 1 in the nan_array, while valid pixels are represented by a 0.

example_nan_mask

I think this formulation works for rectangular images, but in my case I have clipped my imagery to an irregular polygon prior to download to minimize the areal extent of the download due to the total area ceiling on my Planet account. Thus, most or all of the nan pixels actually fall outside of my AOI and should not be included in the valid area calculation.

example_image

I modified the function to calculate the valid image extent by dividing the number of valid pixels in the nan_mask (those with a value of 0) by a hardcoded AOI area, converted to number of pixels by dividing by 9. This resolved the image filtering issue, but is not a robust / generalizable solution. More generally, I think that the udm2 -> udm conversion and general masking routine might be causing problems in the coregistration step as well.


def get_file_extent(nan_path):
    nan_mask = load_udm(nan_path)
    aoi_area = np.where(nan_mask < 1)[0].size
    # aoi_area = nan_mask.shape[1]*nan_mask.shape[0]
    nan_area = np.sum(nan_mask)
    # image_extent = 100-(nan_area/aoi_area)*100
    image_extent = ((aoi_area / (2538809 / 9)) * 100)

    return image_extent

Computing environment

OS: Windows 10 Enterprise Processor: 12th Gen Intel(R) Core(TM) i7-1265U Conda environment

name: coastsat_ps
channels:
  - conda-forge
  - defaults
dependencies:
  - affine=2.4.0=pyhd8ed1ab_0
  - alabaster=0.7.12=pyhd3eb1b0_0
  - anyio=3.7.1=pyhd8ed1ab_0
  - argon2-cffi=23.1.0=pyhd8ed1ab_0
  - argon2-cffi-bindings=21.2.0=py37hcc03f2d_2
  - arosics=1.2.6=py37h03978a9_0
  - arrow=1.2.3=py37haa95532_1
  - astroid=2.14.2=py37haa95532_0
  - astropy=4.2.1=py37hcc03f2d_0
  - atomicwrites=1.4.0=py_0
  - attrs=23.2.0=pyh71513ae_0
  - autopep8=1.6.0=pyhd3eb1b0_1
  - babel=2.11.0=py37haa95532_0
  - backcall=0.2.0=pyh9f0ad1d_0
  - backports=1.0=pyhd8ed1ab_3
  - backports.functools_lru_cache=2.0.0=pyhd8ed1ab_0
  - bcrypt=3.2.0=py37h2bbff1b_1
  - beautifulsoup4=4.12.3=pyha770c72_0
  - binaryornot=0.4.4=pyhd3eb1b0_1
  - black=22.6.0=py37haa95532_0
  - blas=1.0=mkl
  - bleach=6.1.0=pyhd8ed1ab_0
  - blosc=1.21.1=hcbbf2c4_0
  - bokeh=2.4.3=pyhd8ed1ab_3
  - branca=0.7.1=pyhd8ed1ab_0
  - brotli=1.0.9=h2bbff1b_7
  - brotli-bin=1.0.9=h2bbff1b_7
  - brotli-python=1.0.9=py37hf2a7229_7
  - bzip2=1.0.8=hcfcfb64_5
  - ca-certificates=2024.3.11=haa95532_0
  - cartopy=0.18.0=py37h80a4efb_1
  - certifi=2024.2.2=pyhd8ed1ab_0
  - cffi=1.15.1=py37ha95fbe2_1
  - cfitsio=3.470=h0af3d06_7
  - chardet=4.0.0=py37haa95532_1003
  - charls=2.2.0=h6c2663c_0
  - charset-normalizer=3.3.2=pyhd8ed1ab_0
  - click=8.0.4=py37haa95532_0
  - click-plugins=1.1.1=py_0
  - cligj=0.7.2=pyhd8ed1ab_1
  - cloudpickle=2.2.1=pyhd8ed1ab_0
  - cmocean=2.0=py_3
  - colorama=0.4.6=pyhd8ed1ab_0
  - colorcet=3.1.0=pyhd8ed1ab_0
  - colorspacious=1.1.2=pyh24bf2e0_0
  - cookiecutter=1.7.3=pyhd3eb1b0_0
  - cryptography=39.0.1=py37h21b164f_0
  - curl=8.1.2=h68f0423_0
  - cycler=0.11.0=pyhd8ed1ab_0
  - cytoolz=0.12.0=py37hcc03f2d_0
  - dask-core=2022.2.0=pyhd8ed1ab_0
  - dataclasses=0.8=pyhc8e2a94_3
  - debugpy=1.6.3=py37hf2a7229_0
  - decorator=5.1.1=pyhd8ed1ab_0
  - defusedxml=0.7.1=pyhd8ed1ab_0
  - diff-match-patch=20200713=pyhd3eb1b0_0
  - dill=0.3.8=pyhd8ed1ab_0
  - docutils=0.17.1=py37haa95532_1
  - entrypoints=0.4=pyhd8ed1ab_0
  - exceptiongroup=1.2.0=pyhd8ed1ab_2
  - expat=2.6.2=h63175ca_0
  - fftw=3.3.10=nompi_h38027f0_108
  - fiona=1.8.22=py37hac22706_0
  - flake8=4.0.1=pyhd3eb1b0_1
  - folium=0.16.0=pyhd8ed1ab_0
  - freetype=2.12.1=hdaf720e_2
  - freexl=1.0.6=h67ca5e6_1
  - fsspec=2023.1.0=pyhd8ed1ab_0
  - geoarray=0.15.9=py37h03978a9_0
  - geojson=3.1.0=pyhd8ed1ab_0
  - geopandas=0.9.0=pyhd8ed1ab_1
  - geopandas-base=0.9.0=pyhd8ed1ab_1
  - geos=3.8.0=h33f27b4_0
  - geotiff=1.7.0=h4545760_1
  - giflib=5.2.1=h64bf75a_3
  - glib=2.80.0=h39d0aa6_1
  - glib-tools=2.80.0=h0a98069_1
  - hdf4=4.2.13=h712560f_2
  - hdf5=1.10.6=nompi_h5268f04_1114
  - holoviews=1.17.1=pyhd8ed1ab_0
  - icc_rt=2022.1.0=h6049295_2
  - icu=58.2=ha925a31_3
  - idna=3.6=pyhd8ed1ab_0
  - imagecodecs=2021.8.26=py37hc0a7faf_1
  - imageio=2.13.1=pyhd8ed1ab_0
  - imagesize=1.4.1=py37haa95532_0
  - importlib-metadata=3.10.0=py37haa95532_0
  - importlib_metadata=3.10.0=hd3eb1b0_0
  - importlib_resources=6.0.0=pyhd8ed1ab_0
  - inflection=0.5.1=py37haa95532_0
  - intel-openmp=2024.0.0=h57928b3_49841
  - intervaltree=3.1.0=pyhd3eb1b0_0
  - ipykernel=6.16.2=pyh025b116_0
  - ipython=7.33.0=py37h03978a9_0
  - ipython_genutils=0.2.0=py_1
  - isort=5.9.3=pyhd3eb1b0_0
  - jedi=0.17.2=py37h03978a9_2
  - jellyfish=0.9.0=py37h2bbff1b_0
  - jinja2=3.1.3=pyhd8ed1ab_0
  - jinja2-time=0.2.0=pyhd3eb1b0_3
  - jpeg=9e=hcfcfb64_3
  - jsonschema=4.17.3=pyhd8ed1ab_0
  - jupyter_client=7.4.9=pyhd8ed1ab_0
  - jupyter_core=4.11.1=py37h03978a9_0
  - jupyter_server=1.23.4=pyhd8ed1ab_0
  - jupyterlab_pygments=0.3.0=pyhd8ed1ab_1
  - jxrlib=1.1=hcfcfb64_3
  - kealib=1.4.14=h96bfa42_2
  - keyring=23.4.0=py37haa95532_0
  - kiwisolver=1.4.4=py37h8c56517_0
  - krb5=1.20.1=h6609f42_0
  - lazy-object-proxy=1.6.0=py37h2bbff1b_0
  - lcms2=2.12=h2a16943_0
  - lerc=3.0=hd77b12b_0
  - libaec=1.1.3=h63175ca_0
  - libblas=3.9.0=8_mkl
  - libboost-headers=1.84.0=h57928b3_2
  - libbrotlicommon=1.0.9=h2bbff1b_7
  - libbrotlidec=1.0.9=h2bbff1b_7
  - libbrotlienc=1.0.9=h2bbff1b_7
  - libcblas=3.9.0=8_mkl
  - libclang=14.0.6=default_hb5a9fac_1
  - libclang13=14.0.6=default_h8e68704_1
  - libcurl=8.1.2=h68f0423_0
  - libdeflate=1.8=h2bbff1b_5
  - libexpat=2.6.2=h63175ca_0
  - libffi=3.4.2=h8ffe710_5
  - libgdal=3.0.2=hc12e7b7_6
  - libglib=2.80.0=h39d0aa6_1
  - libiconv=1.17=hcfcfb64_2
  - libkml=1.3.0=haf3e7a6_1018
  - liblapack=3.9.0=8_mkl
  - libnetcdf=4.8.1=h6685c40_2
  - libpng=1.6.43=h19919ed_0
  - libpq=12.15=hb652d5d_1
  - libsodium=1.0.18=h8d14728_1
  - libspatialindex=1.9.3=h39d44d4_4
  - libspatialite=4.3.0a=h6ec8781_23
  - libsqlite=3.45.2=hcfcfb64_0
  - libssh2=1.10.0=h680486a_3
  - libtiff=4.4.0=h8a3f274_2
  - libwebp=1.3.2=hbc33d0d_0
  - libwebp-base=1.3.2=hcfcfb64_0
  - libxml2=2.10.4=h0ad7f3c_1
  - libxslt=1.1.37=h2bbff1b_1
  - libzip=1.8.0=h49b8836_0
  - libzlib=1.2.13=hcfcfb64_5
  - libzopfli=1.0.3=h0e60522_0
  - locket=1.0.0=pyhd8ed1ab_0
  - lxml=4.9.1=py37hcc03f2d_0
  - lz4-c=1.9.4=h2bbff1b_0
  - m2w64-expat=2.1.1=2
  - m2w64-gcc-libgfortran=5.3.0=6
  - m2w64-gcc-libs=5.3.0=7
  - m2w64-gcc-libs-core=5.3.0=7
  - m2w64-gettext=0.19.7=2
  - m2w64-gmp=6.1.0=2
  - m2w64-libiconv=1.14=6
  - m2w64-libwinpthread-git=5.0.0.4634.697f757=2
  - m2w64-xz=5.2.2=2
  - mapclassify=2.5.0=pyhd8ed1ab_1
  - markdown=3.3.4=py37haa95532_0
  - markupsafe=2.1.1=py37hcc03f2d_1
  - matplotlib=3.3.4=py37h03978a9_0
  - matplotlib-base=3.3.4=py37h3379fd5_0
  - matplotlib-inline=0.1.6=pyhd8ed1ab_0
  - mccabe=0.7.0=pyhd3eb1b0_0
  - mistune=3.0.2=pyhd8ed1ab_0
  - mkl=2020.4=hb70f87d_311
  - msys2-conda-epoch=20160418=1
  - munch=2.5.0=py_0
  - mypy_extensions=0.4.3=py37haa95532_1
  - nbclassic=1.0.0=pyhb4ecaf3_1
  - nbclient=0.7.0=pyhd8ed1ab_0
  - nbconvert=7.6.0=pyhd8ed1ab_0
  - nbconvert-core=7.6.0=pyhd8ed1ab_0
  - nbconvert-pandoc=7.6.0=pyhd8ed1ab_0
  - nbformat=5.8.0=pyhd8ed1ab_0
  - nest-asyncio=1.6.0=pyhd8ed1ab_0
  - networkx=2.5=py_0
  - notebook=6.5.6=pyha770c72_0
  - notebook-shim=0.2.4=pyhd8ed1ab_0
  - numpy=1.20.1=py37hd20adf4_0
  - numpydoc=1.5.0=py37haa95532_0
  - olefile=0.47=pyhd8ed1ab_0
  - openjpeg=2.3.1=h48faf41_3
  - openssl=1.1.1w=hcfcfb64_0
  - owslib=0.29.2=pyhd8ed1ab_0
  - packaging=23.2=pyhd8ed1ab_0
  - pandas=1.2.2=py37h08fd248_0
  - pandoc=3.1.12.3=h57928b3_0
  - pandocfilters=1.5.0=pyhd8ed1ab_0
  - panel=0.14.4=pyhd8ed1ab_0
  - param=1.13.0=pyh1a96a4e_0
  - paramiko=2.8.1=pyhd3eb1b0_0
  - parso=0.7.0=pyh9f0ad1d_0
  - partd=1.4.1=pyhd8ed1ab_0
  - pathspec=0.10.3=py37haa95532_0
  - pcre=8.45=h0e60522_0
  - pcre2=10.43=h17e33f8_0
  - pexpect=4.8.0=pyhd3eb1b0_3
  - pickleshare=0.7.5=py_1003
  - pillow=9.3.0=py37hdc2b20a_1
  - pip=24.0=pyhd8ed1ab_0
  - pkgutil-resolve-name=1.3.10=pyhd8ed1ab_1
  - platformdirs=4.0.0=pyhd8ed1ab_0
  - plotly=5.19.0=pyhd8ed1ab_0
  - pluggy=1.0.0=py37haa95532_1
  - ply=3.11=py37_0
  - pooch=1.8.1=pyhd8ed1ab_0
  - poppler=0.87.0=h0cd1227_1
  - poppler-data=0.4.12=hd8ed1ab_0
  - postgresql=12.15=hb652d5d_1
  - poyo=0.5.0=pyhd3eb1b0_0
  - proj=6.2.1=h3758d61_0
  - prometheus_client=0.17.1=pyhd8ed1ab_0
  - prompt-toolkit=3.0.42=pyha770c72_0
  - psutil=5.9.3=py37h51bd9d9_0
  - ptyprocess=0.7.0=pyhd3eb1b0_2
  - py-tools-ds=0.20.2=py37h03978a9_0
  - pycodestyle=2.8.0=pyhd3eb1b0_0
  - pycparser=2.21=pyhd8ed1ab_0
  - pyct=0.4.6=py_0
  - pyct-core=0.4.6=py_0
  - pydocstyle=6.3.0=py37haa95532_0
  - pyepsg=0.4.0=py_0
  - pyerfa=2.0.0.1=py37h3a130e4_2
  - pyfftw=0.12.0=py37h37aab17_3
  - pyflakes=2.4.0=pyhd3eb1b0_0
  - pygments=2.17.2=pyhd8ed1ab_0
  - pykdtree=1.3.5=py37h3a130e4_0
  - pykrige=1.4.0=py37h452e1ab_1001
  - pylint=2.16.2=py37haa95532_0
  - pyls-spyder=0.4.0=pyhd3eb1b0_0
  - pynacl=1.5.0=py37h8cc25b3_0
  - pyparsing=3.1.2=pyhd8ed1ab_0
  - pyproj=2.6.1.post1=py37h593ac45_1
  - pyqt=5.15.7=py37hd77b12b_0
  - pyqt5-sip=12.11.0=py37hd77b12b_0
  - pyqtwebengine=5.15.7=py37hd77b12b_0
  - pyrsistent=0.18.1=py37hcc03f2d_1
  - pyshp=2.3.1=pyhd8ed1ab_0
  - pysocks=1.7.1=py37h03978a9_5
  - python=3.7.12=h7840368_100_cpython
  - python-dateutil=2.9.0=pyhd8ed1ab_0
  - python-fastjsonschema=2.19.1=pyhd8ed1ab_0
  - python-lsp-black=1.2.1=py37haa95532_0
  - python-lsp-jsonrpc=1.0.0=pyhd3eb1b0_0
  - python-lsp-server=1.5.0=py37haa95532_0
  - python-slugify=5.0.2=pyhd3eb1b0_0
  - python_abi=3.7=4_cp37m
  - pytoolconfig=1.2.5=py37haa95532_1
  - pytz=2020.1=pyh9f0ad1d_0
  - pyviz_comms=3.0.1=pyhd8ed1ab_0
  - pywavelets=1.3.0=py37h3a130e4_1
  - pywin32=303=py37hcc03f2d_0
  - pywin32-ctypes=0.2.0=py37_1001
  - pywinpty=2.0.8=py37h7f67f24_0
  - pyyaml=6.0=py37hcc03f2d_4
  - pyzmq=24.0.1=py37h7347f05_0
  - qdarkstyle=3.0.2=pyhd3eb1b0_0
  - qstylizer=0.2.2=py37haa95532_0
  - qt=5.15.7=haa95532_0
  - qt-main=5.15.2=h6072711_9
  - qt-webengine=5.15.9=h5bd16bc_7
  - qtawesome=1.2.2=py37haa95532_0
  - qtconsole=5.3.2=py37haa95532_0
  - qtpy=2.2.0=py37haa95532_0
  - qtwebkit=5.212=h2bbfb41_5
  - rasterio=1.2.10=py37h17c1fa0_0
  - requests=2.31.0=pyhd8ed1ab_0
  - rope=1.7.0=py37haa95532_0
  - rtree=1.0.1=py37h13cc57e_0
  - scikit-image=0.18.3=py37h9386db6_1
  - scikit-learn=0.20.3=py37h3d241f0_1
  - scipy=1.2.1=py37h29ff71c_0
  - send2trash=1.8.2=pyh08f2357_0
  - setuptools=59.8.0=py37h03978a9_1
  - shapely=1.8.4=py37h9064783_0
  - sip=6.6.2=py37hd77b12b_0
  - six=1.16.0=pyh6c4a22f_0
  - snappy=1.1.10=hfb803bf_0
  - sniffio=1.3.1=pyhd8ed1ab_0
  - snowballstemmer=2.2.0=pyhd3eb1b0_0
  - snuggs=1.4.7=py_0
  - sortedcontainers=2.4.0=pyhd3eb1b0_0
  - soupsieve=2.3.2.post1=pyhd8ed1ab_0
  - spectral=0.23.1=pyh1a96a4e_0
  - sphinx=4.2.0=pyhd3eb1b0_1
  - sphinxcontrib-applehelp=1.0.2=pyhd3eb1b0_0
  - sphinxcontrib-devhelp=1.0.2=pyhd3eb1b0_0
  - sphinxcontrib-htmlhelp=2.0.0=pyhd3eb1b0_0
  - sphinxcontrib-jsmath=1.0.1=pyhd3eb1b0_0
  - sphinxcontrib-qthelp=1.0.3=pyhd3eb1b0_0
  - sphinxcontrib-serializinghtml=1.1.5=pyhd3eb1b0_0
  - spyder=5.3.3=py37haa95532_0
  - spyder-kernels=2.3.3=py37haa95532_0
  - sqlite=3.45.2=hcfcfb64_0
  - tbb=2020.2=h2d74725_4
  - tenacity=8.2.3=pyhd8ed1ab_0
  - terminado=0.17.0=pyh08f2357_0
  - text-unidecode=1.3=pyhd3eb1b0_0
  - textdistance=4.2.1=pyhd3eb1b0_0
  - three-merge=0.1.1=pyhd3eb1b0_0
  - tifffile=2020.12.8=pyhd8ed1ab_0
  - tiledb=2.3.3=h3649cd2_2
  - tinycss2=1.2.1=pyhd8ed1ab_0
  - tk=8.6.13=h5226925_1
  - toml=0.10.2=pyhd3eb1b0_0
  - tomli=2.0.1=py37haa95532_0
  - tomlkit=0.11.1=py37haa95532_0
  - toolz=0.12.1=pyhd8ed1ab_0
  - tornado=6.2=py37hcc03f2d_0
  - tqdm=4.66.2=pyhd8ed1ab_0
  - traitlets=5.9.0=pyhd8ed1ab_0
  - typed-ast=1.4.3=py37h2bbff1b_1
  - typing-extensions=4.7.1=hd8ed1ab_0
  - typing_extensions=4.7.1=pyha770c72_0
  - ucrt=10.0.22621.0=h57928b3_0
  - ujson=5.4.0=py37hd77b12b_0
  - unidecode=1.2.0=pyhd3eb1b0_0
  - uriparser=0.9.7=h1537add_1
  - urllib3=2.2.1=pyhd8ed1ab_0
  - vc=14.3=hcf57466_18
  - vc14_runtime=14.38.33130=h82b7239_18
  - vs2015_runtime=14.38.33130=hcb4865c_18
  - watchdog=2.1.6=py37haa95532_0
  - wcwidth=0.2.10=pyhd8ed1ab_0
  - webencodings=0.5.1=pyhd8ed1ab_2
  - websocket-client=1.6.1=pyhd8ed1ab_0
  - whatthepatch=1.0.2=py37haa95532_0
  - wheel=0.43.0=pyhd8ed1ab_0
  - win_inet_pton=1.1.0=pyhd8ed1ab_6
  - winpty=0.4.3=4
  - wrapt=1.14.1=py37h2bbff1b_0
  - xerces-c=3.2.4=hd77b12b_1
  - xyzservices=2023.5.0=pyhd8ed1ab_0
  - xz=5.4.6=h8cc25b3_0
  - yaml=0.2.5=h8ffe710_2
  - yapf=0.31.0=pyhd3eb1b0_0
  - zeromq=4.3.4=h0e60522_1
  - zfp=0.5.5=h0e60522_8
  - zipp=3.15.0=pyhd8ed1ab_0
  - zlib=1.2.13=hcfcfb64_5
  - zstd=1.5.5=hd43e919_0
  - pip:
      - gdal==3.0.2
prefix: C:\Users\croland\AppData\Local\miniconda3\envs\coastsat_ps
collinjroland commented 6 months ago

Update

I updated this to a less hardcoded solution, but I am still not sure if it matches the intent of the function. I think the intent of the function is quantify the fraction of valid pixels within the image, so the denominator in the image_extent calculation is supposed to the be the number of pixels in the image that intersect the AOI?

  1. Added these lines to the bottom of section 0) User Input Settings import geopandas as gpd settings['aoi_area'] = gpd.read_file(settings['aoi_geojson']).to_crs(settings['output_epsg']).area[0]
  2. Modify get_file_extent

    def get_file_extent(nan_path, settings):
    nan_mask = load_udm(nan_path)
    valid_area = np.where(nan_mask < 1)[0].size
    # aoi_area = nan_mask.shape[1]*nan_mask.shape[0]
    nan_area = np.sum(nan_mask)
    # image_extent = 100-(nan_area/aoi_area)*100
    image_extent = ((valid_area / (settings['aoi_area'] / 9)) * 100)
    
    return image_extent
ydoherty commented 5 months ago

Hi Colin, it sounds like you're all over it.

In the settings dictionary, the 'extent_thresh' is set as 80% by default. While developing the toolbox, I used this to filter out images that only partially overlapped my AOI as some images only captured a small area (say only a small corner of the image). Since the overlap with the co-registration reference image is very small in these cases, AROSICS was prone to crashing.

You are correct with the intent of the get_file_extent function and the assumption that the input image is a rectangle. The idea is to find out how much of the image covers the defined AOI. When I wrote the code, I don't think I had the option to do a polygon crop prior to downloading images so all images were rectangular. Consequently, the percentage of valid pixels was the count of non-nan pixels, divided by the number of pixels (width x height).

Your equivalent function seems correct with the idea being to find the percentage of valid pixels within your polygon AOI.

Just a side thought, the co-registration in the code works by matching land pixels between images. To do this, all sand, whitewater and water pixels found using the classifier are masked away. If your AOI polygon crop is too tight to the beach, AROSICS may only have a tiny strip of land to align images which may cause issues. I'd run a few tests at your site with a handful of rectangular images including more land based features to see if that helps with the co-registration.

Kind regards, Yarran