nansencenter / metanorm

Metadata normalizing tool
GNU General Public License v3.0
0 stars 1 forks source link

Datasets with identical metadata may link with multiple URIs #22

Closed akorosov closed 3 years ago

akorosov commented 3 years ago

In the case when two files have the same time and the same coordinates and other parameters (e.g. GW1AM2__01D_EQOA and GW1AM2__01D_EQOA) only one Dataset is created which has link to two URIs.

Instead two Datasets should be created, each pointing into two different files with different filenames.

opsdep commented 3 years ago

**

**: 'title':'Daily Sea Ice Concentration Analysis from OSI SAF EUMETSAT' 'product_id':'OSI-401' 'product_name':'osi_saf_ice_conc' 'product_status':'operational' 'abstract':'The daily analysis of sea ice concentration is obtained from\noperation satellite images of the polar regions. It is based\non atmospherically corrected signal and a carefully selected\nsea ice concentration algorithm. This product is freely\navailable from the EUMETSAT Ocean and Sea Ice Satellite\nApplication Facility (OSI SAF).' 'topiccategory':'Oceans ClimatologyMeteorologyAtmosphere' 'keywords':'Sea Ice Concentration,Sea Ice,Oceanography,Meteorology,Climate,Remote Sensing' 'gcmd_keywords':'Cryosphere > Sea Ice > Sea Ice Concentration\nOceans > Sea Ice > Sea Ice Concentration\nGeographic Region > Northern Hemisphere\nVertical Location > Sea Surface\nEUMETSAT/OSISAF > Satellite Application Facility on Ocean and Sea Ice, European Organisation for the Exploitation of Meteorological Satellites' 'northernmost_latitude':'90.0' 'southernmost_latitude':'30.980564' 'easternmost_longitude':'180.0' 'westernmost_longitude':'-180.0' 'activity_type':'Space borne instrument' 'area':'Northern Hemisphere' 'instrument_type':'Multi-sensor analysis' 'platform_name':'Multi-sensor analysis' 'start_date':'2015-08-01 00:00:00' 'stop_date':'2015-08-02 00:00:00' 'project_name':'EUMETSAT OSI SAF' 'institution':'EUMETSAT OSI SAF' 'PI_name':'Rasmus Tonboe' 'contact':'osisaf-manager@met.no' 'distribution_statement':'Free' 'copyright_statement':'Copyright 2015 EUMETSAT' 'references':'OSI SAF Sea Ice Product Manual, Eastwood S. (editor), v3.7, April 2011\nhttp://osisaf.met.no\nhttp://www.osi-saf.org' 'history':'2015-08-02 creation\n2016-10-20: changed lat and lon fields to partly correct small bug20170621: Added empty ice_conc_unfiltered and masks field, to be consistent with latest operational product format' 'product_version':'2.2' 'software_version':'4.1' 'netcdf_version':'3.6.3' 'Conventions':'CF-1.4' 'raw_dataset_parameters':['sea_ice_area_fraction', 'sea_ice_area_fractio...tatus_flag', 'sea_ice_area_fraction']

**

** 'Conventions':'CF-1.6' 'title':'VIIRS L2P SST' 'summary':'Sea Surface temperature retrievals produced at the Naval Oceanographic office from the VIIRS sensor onboard NPP' 'references':'NAVOCEANO MCSST' 'institution':'NAVO' 'history':'Created with VIIRSseatemp on 2014/01/06 at 01:05:13 UT' 'comment':'none' 'license':'GHRSST protocol describes data use as free and open' 'id':'VIIRS_NPP-NAVO-L2P-v1.0' 'naming_authority':'org.ghrsst' 'product_version':'01.0' 'uuid':'42f7b87c-767c-4f2a-8081-d54fd4ad47ac' 'gds_version_id':'02.0' 'netcdf_version_id':'4.1.2 of Jan 15 2013 19:31:31 $' 'date_created':'20140106T010513' 'file_quality_level':'3' 'spatial_resolution':'1500 m at nadir' 'start_time':'20140105T235740Z' 'time_coverage_start':'20140105T235740Z' 'stop_time':'20140105T235905Z' 'time_coverage_end':'20140105T235905Z' 'northernmost_latitude':'19.2333603' 'southernmost_latitude':'9.97817993' 'easternmost_longitude':'-145.476471' 'westernmost_longitude':'-174.742493' 'source':'GMODO_npp,SVM05_npp,SVM07_npp,SVM012_npp,SVM15_npp.SVM16_npp,k100_NAVO_L4,K10_NAVO_L4' 'platform':'NPP' 'sensor':'VIIRS' 'Metadata_Conventions':'Unidata Dataset Discovery v1.0' 'metadata_link':'http://podaac.jpl.nasa.gov/ws/metadata/dataset/?format=iso&shortName=VIIRS_NPP-NAVO-L2P-v1.0' 'keywords':'Oceans > Ocean Temperature > Sea Surface Temperature' 'keywords_vocabulary':'NASA Global Change Master Directory (GCMD) Science Keywords' 'standard_name_vocabulary':'NetCDF Climate and Forecast (CF) Metadata Convention' 'geospatial_lat_units':'degrees_north' 'geospatial_lat_resolution':'0.00749999983' 'geospatial_lon_units':'degrees_east' 'geospatial_lon_resolution':'0.00749999983' 'acknowledgment':'The data from the Naval Oceanographic Office are made available under Multi-sensor Improved Sea Surface Temperature (MISST) project sponsorship by the Office of Naval Research (ONR).' 'creator_name':'Keith Willis' 'creator_email':'Keith.D.Willis@navy.mil' 'creator_url':'http://www.usno.navy.mil/NAVO' 'project':'Group for High Resolution Sea Surface Temperature' 'publisher_name':'The GHRSST Project Office' 'publisher_url':'http://www.ghrsst.org' 'publisher_email':'ghrsst-po@nceo.ac.uk' 'processing_level':'L2P' 'cdm_data_type':'swath' 'raw_dataset_parameters':[]

'geospatial_bounds':'POLYGON((-111.854 -9.214, -104.891 25.146, -75.797 20.674, -84.380 -13.465, -111.854 -9.214))' 'geospatial_first_scanline_first_fov_lat':'25.1461678' 'geospatial_first_scanline_first_fov_lon':'-104.890808' 'geospatial_first_scanline_last_fov_lat':'20.6738358' 'geospatial_first_scanline_last_fov_lon':'-75.7971039' 'geospatial_last_scanline_first_fov_lat':'-9.21394825' 'geospatial_last_scanline_first_fov_lon':'-111.854401' 'geospatial_last_scanline_last_fov_lat':'-13.4649153' 'geospatial_last_scanline_last_fov_lon':'-84.3800583' 'Conventions':'CF-1.6, ACDD-1.3' 'Metadata_Conventions':'Unidata Dataset Discovery v1.0' 'acknowledgement':'Please acknowledge the use of these data with the following statement: These data were provided by Group for High Resolution Sea Surface Temperature (GHRSST) and the National Oceanic and Atmospheric Administration (NOAA).' 'cdm_data_type':'swath' 'comment':'none' 'creator_email':'Alex.Ignatov@noaa.gov' 'creator_name':'Alex Ignatov' 'creator_url':'http://www.star.nesdis.noaa.gov' 'date_created':'20200910T082707Z' 'destripe':'yes (M5:1.0:f M7:1.0:f M10:1.0:f M12:1.0:b M14:1.0:b M15:1.0:b M16:1.0:b)' 'easternmost_longitude':'-75.7971039' 'file_quality_level':'3' 'gds_version_id':'02.0' 'geospatial_lat_resolution':'0.00669999979' 'geospatial_lat_units':'degrees_north' 'geospatial_lon_resolution':'0.00669999979' 'geospatial_lon_units':'degrees_east' 'history':'Created by Advanced Clear-Sky Processor for Oceans (ACSPO)-VIIRS at NOAA/NESDIS/OSPO.' 'id':'VIIRS_N20-OSPO-L2P-v2.61' 'institution':'NOAA/NESDIS/OSPO' 'keywords':'Oceans > Ocean Temperature > Sea Surface Temperature' 'keywords_vocabulary':'NASA Global Change Master Directory (GCMD) Science Keywords' 'license':'GHRSST protocol describes data use as free and open' 'metadata_link':'http://podaac.jpl.nasa.gov/ws/metadata/dataset/?format=iso&shortName=VIIRS_N20-OSPO-L2P-v2.61' 'naming_authority':'org.ghrsst' 'northernmost_latitude':'25.1461678' 'platform':'N20' 'processing_level':'L2P' 'product_version':'2.61' 'project':'Group for High Resolution Sea Surface Temperature' 'publisher_email':'ghrsst-po@nceo.ac.uk' 'publisher_name':'The GHRSST Project Office' 'publisher_url':'http://www.ghrsst.org' 'references':'Data convention: GHRSST Data Specification (GDS) v2.0. Algorithms: ACSPO-VIIRS ATBD (NOAA/NESDIS/STAR)' 'sensor':'VIIRS' 'aggregator_version':'V1.00' 'preprocessor_version':'1.14' 'sst_luts':'LUT_VIIRS_N20_L2P_DEPTH_DAY_V01.04_20181217.txt,LUT_VIIRS_N20_L2P_SKIN_DAYNIGHT_V01.00_20180408.txt,LUT_VIIRS_N20_L2P_DEPTH_NIGHT_V01.04_20181217.txt,LUT_VIIRS_N20_L2P_SKIN_NIGHT_V01.00_20180408.txt' 'source':'VIIRS-MOD-GEO-TC,VIIRS-M5-SDR,VIIRS-M7-SDR,VIIRS-M10-SDR,VIIRS-M12-SDR,VIIRS-M14-SDR,VIIRS-M15-SDR,VIIRS-M16-SDR,CMC0.1deg-CMC-L4-GLOB-v2.0,NOAA-NCEP-GFS' 'southernmost_latitude':'-13.4649153' 'spatial_resolution':'742 m at nadir' 'standard_name_vocabulary':'CF Standard Name Table (v26, 08 November 2013)' 'start_time':'20200910T074001Z' 'stop_time':'20200910T075000Z' 'summary':'Sea surface temperature retrievals produced by NOAA/NESDIS/OSPO office from VIIRS sensor' 'time_coverage_end':'20200910T075000Z' 'time_coverage_start':'20200910T074001Z' 'title':'VIIRS L2P SST' 'uuid':'6a1b1da2-f33f-11ea-9cce-61cb6ea9f263' 'westernmost_longitude':'-111.854401' 'netcdf_version_id':'4.5.0 of Jul 20 2018 12:34:15 $' 'raw_dataset_parameters':[]

'Conventions':'CF-1.7, ACDD-1.3' 'title':'MODIS Aqua L2P SST' 'summary':'Sea surface temperature retrievals produced at the NASA OBPG for the MODIS Aqua sensor. These have been reformatted to GHRSST GDS specifications by the JPL PO.DAAC' 'references':'GHRSST Data Processing Specification v2r5' 'institution':'NASA/JPL/OBPG/RSMAS' 'history':'MODIS L2P created at JPL PO.DAAC' 'comment':'L2P Core without DT analysis or other ancillary fields; Day, Start Node:Ascending, End Node:Ascending; WARNING Some applications are unable to properly handle signed byte values. If values are encountered > 127, please subtract 256 from this reported value; Quicklook' 'license':'GHRSST and PO.DAAC protocol allow data use as free and open.' 'id':'MODIS_A-JPL-L2P-v2019.0' 'naming_authority':'org.ghrsst' 'product_version':'2019.0' 'uuid':'f6e1f61d-c4a4-4c17-8354-0c15e12d688b' 'gds_version_id':'2.0' 'netcdf_version_id':'4.1' 'date_created':'20200910T071546Z' 'file_quality_level':'3' 'spatial_resolution':'1km' 'start_time':'20200910T045500Z' 'time_coverage_start':'20200910T045500Z' 'stop_time':'20200910T045957Z' 'time_coverage_end':'20200910T045957Z' 'northernmost_latitude':'29.6277008' 'southernmost_latitude':'8.70578003' 'easternmost_longitude':'139.212997' 'westernmost_longitude':'112.998001' 'source':'MODIS sea surface temperature observations for the OBPG' 'platform':'Aqua' 'sensor':'MODIS' 'metadata_link':'http://podaac.jpl.nasa.gov/ws/metadata/dataset/?format=iso&shortName=MODIS_A-JPL-L2P-v2019.0' 'keywords':'Oceans > Ocean Temperature > Sea Surface Temperature' 'keywords_vocabulary':'NASA Global Change Master Directory (GCMD) Science Keywords' 'standard_name_vocabulary':'NetCDF Climate and Forecast (CF) Metadata Convention' 'geospatial_lat_units':'degrees_north' 'geospatial_lat_resolution':'0.00999999978' 'geospatial_lon_units':'degrees_east' 'geospatial_lon_resolution':'0.00999999978' 'acknowledgment':'The MODIS L2P sea surface temperature data are sponsored by NASA' 'creator_name':'Ed Armstrong, JPL PO.DAAC' 'creator_email':'edward.m.armstrong@jpl.nasa.gov' 'creator_url':'http://podaac.jpl.nasa.gov' 'project':'Group for High Resolution Sea Surface Temperature' 'publisher_name':'The GHRSST Project Office' 'publisher_url':'http://www.ghrsst.org' 'publisher_email':'ghrsst-po@nceo.ac.uk' 'processing_level':'L2P' 'cdm_data_type':'swath' 'startDirection':'Ascending' 'endDirection':'Ascending' 'day_night_flag':'Day' 'raw_dataset_parameters':[]

'Acquisition Type':'NOMINAL' 'Carrier rocket':'Soyuz' 'Cycle number':'16' 'Date':'2014-10-04T04:45:45.875Z' 'Filename':'S1A_IW_SLC__1SDV_20141004T044545_20141004T044614_002675_002FBB_1E07.SAFE' 'Footprint':' 44.038876,21.895403 44.433186,18.745359 46.130161,19.119640 45.735531,22.367476 44.038876,21.895403</gml:coordinates> </gml:LinearRing> </gml:outerBoundaryIs></gml:Polygon>' 'Format':'SAFE' 'Identifier':'S1A_IW_SLC__1SDV_20141004T044545_20141004T044614_002675_002FBB_1E07' 'Ingestion Date':'2014-12-06T04:36:38.583Z' 'Instrument':'SAR-C' 'Instrument abbreviation':'SAR-C SAR' 'Instrument description':'https://sentinel.esa.int/web/sentinel/missions/sentinel-1' 'Instrument description text':'The SAR Antenna Subsystem (SAS) is developed and build by AstriumGmbH. It is a large foldable planar phased array antenna, which isformed by a centre panel and two antenna side wings. In deployedconfiguration the antenna has an overall aperture of 12.3 x 0.84 m.The antenna provides a fast electronic scanning capability inazimuth and elevation and is based on low loss and highly stablewaveguide radiators build in carbon fibre technology, which arealready successfully used by the TerraSAR-X radar imaging mission.The SAR Electronic Subsystem (SES) is developed and build byAstrium Ltd. It provides all radar control, IF/ RF signalgeneration and receive data handling functions for the SARInstrument. The fully redundant SES is based on a channelisedarchitecture with one transmit and two receive chains, providing amodular approach to the generation and reception of wide-bandsignals and the handling of multi-polarisation modes. One keyfeature is the implementation of the Flexible Dynamic BlockAdaptive Quantisation (F... 'Instrument mode':'IW' 'Instrument name':'Synthetic Aperture Radar (C-band)' 'Instrument swath':'IW1 IW2 IW3' 'JTS footprint':'POLYGON ((21.895403 44.038876,18.745359 44.433186,19.119640 46.130161,22.367476 45.735531,21.895403 44.038876))' 'Launch date':'April 3rd, 2014' 'Mission datatake id':'12219' 'Mission type':'Earth observation' 'Mode':'IW' 'NSSDC identifier':'0000-000A' 'Operator':'European Space Agency' 'Orbit number (start)':'2675' 'Orbit number (stop)':'2675' 'Pass direction':'DESCENDING' 'Phase identifier':'1' 'Polarisation':'VV VH' 'Product class':'S' 'Product class description':'SAR Standard L1 Product' 'Product composition':'Slice' 'Product level':'L1' 'Product type':'SLC' 'Relative orbit (start)':'155' 'Relative orbit (stop)':'155' 'Satellite':'Sentinel-1' 'Satellite description':'https://sentinel.esa.int/web/sentinel/missions/sentinel-1' 'Satellite name':'Sentinel-1' 'Satellite number':'A' 'Sensing start':'2014-10-04T04:45:45.875Z' 'Sensing stop':'2014-10-04T04:46:14.180Z' 'Size':'8 GB' 'Slice number':'8' 'Start relative orbit number':'155' 'Status':'ARCHIVED' 'Stop relative orbit number':'155' 'Timeliness Category':'Fast-24h' 'url':"https://scihub.copernicus.eu/apihub/odata/v1/Products('1a4ff15b-1504-4d94-8675-e12c06b02858')/$value"

akorosov commented 3 years ago

ISISAF and PODAAC are fetched from DDXIngester DDXIngester should also return url in raw_attributes Then the URL - normalizer should take care of filename from URL

FTPingester also returns URL, so it should also use the URL-normalizer

SentinelSafe ingester return 'Identifier' SentinelSAFEMetadataNormalizer should use it as entry_id

In the tests for URL-normalizer: add cases with normalization of entry_id from all different urls from DDX and FTP ingesters. os.path.basename os.path.splitext

Implemenetation plan:

  1. Update metanorm so that entry_id is correctly normalized (both from URL and from Identifier).
  2. Update django-geo-spaas-harvesting so that url is added to raw_attributes in DDX, entry_id is added to list of normalized attributes, entry_id is written into database.
opsdep commented 3 years ago

@akorosov @aperrin66 There is no need to reopen this issue. Everything is fine for it.