opendatacube / datacube-core

Open Data Cube analyses continental scale Earth Observation data through time
http://www.opendatacube.org
Apache License 2.0
493 stars 175 forks source link

Dataset CLI tool to find duplicates #1517

Closed Ariana-B closed 7 months ago

Ariana-B commented 7 months ago

Reason for this pull request

Easily scan the database for duplicate indexed datasets, identifying duplicates by a set of specified field values, and produce a report on any datasets found.

Proposed changes


:books: Documentation preview :books:: https://datacube-core--1517.org.readthedocs.build/en/1517/

SpacemanPaul commented 7 months ago

As discussed on Teams:

You have in sql agdc.common_timestamp( str-value-from-json )::timestamp

common_timestamp returns a timestamp with timezone, then casting to timestamp with ::timestamp returns a timestamp WITHOUT timezone by IGNORING the timezone.

I think we need to explicitly convert to UTC, probably with the AT TIME ZONE operator. (Or to the system time zone? But that would complicate testing)

(Classic gotcha - virtually never what any real world user would actually want, and contrary to the SQL standard.)

codecov[bot] commented 7 months ago

Codecov Report

Attention: 1 lines in your changes are missing coverage. Please review.

Comparison is base (9fb8c8c) 91.75% compared to head (94d6a5b) 91.79%. Report is 2 commits behind head on develop.

Files Patch % Lines
datacube/index/memory/_datasets.py 80.00% 1 Missing :warning:
Additional details and impacted files ```diff @@ Coverage Diff @@ ## develop #1517 +/- ## =========================================== + Coverage 91.75% 91.79% +0.03% =========================================== Files 132 132 Lines 14552 14617 +65 =========================================== + Hits 13352 13417 +65 Misses 1200 1200 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.