opendatacube / datacube-core

Open Data Cube analyses continental scale Earth Observation data through time
http://www.opendatacube.org
Apache License 2.0
493 stars 175 forks source link

ODC feature request: env var for `skip_broken_datasets`? #1518

Open robbibt opened 7 months ago

robbibt commented 7 months ago

Over the past few months we've been encountering intermittent GDAL access issue semi-regularly. e.g.:

CPLE_OpenFailedError: '/vsis3/dea-public-data/baseline/ga_s2bm_ard_3/52/LBK/2020/08/15/20200815T032931/ga_s2bm_nbart_3-2-1_52LBK_2020-08-15_final_band03.tif' not recognized as a supported file format.

This is a real pain, particularly in automated testing where a random fail can cause us to need to re-run our entire slow test suite.

datacube.load has a handy skip_broken_datasets param that can be used to workaround this issue. However, we don't really want to set this in every notebook/script as it adds complexity and potentially makes workflows non-reproducible.

Thougts on adding support for a global environmental variable (e.g. ODC_SKIP_BROKEN_DATASETS etc) that could be set to force datacube to skip broken datasets, even if this wasn't set in Python code itself? This would allow us to set this in our tests, allowing the tests to be more robust to these issues without impacting user code.

robbibt commented 7 months ago

@SpacemanPaul

SpacemanPaul commented 7 months ago

Probably best handled in the 1.9 branch after #1505 is merged.

Add skip_broken_datasets as a config option, defaulting to False. Will automatically be over-rideable per environment with e.g. ODC_PROD_SKIP_BROKEN_DATASETS and/or ODC_DEV_SKIP_BROKEN_DATASETS.