nansencenter / nansat

Scientist friendly Python toolbox for processing 2D satellite Earth observation data.
http://nansat.readthedocs.io
GNU General Public License v3.0
181 stars 66 forks source link

GCMD keyword metadata items for datasets based on multiple platforms and instruments #389

Open korvinos opened 5 years ago

korvinos commented 5 years ago

description: https://podaac.jpl.nasa.gov/dataset/UKMO-L4HRfnd-GLOB-OSTIA data: https://podaac-opendap.jpl.nasa.gov/opendap/allData/ghrsst/data/L4/GLOB/UKMO/OSTIA/

korvinos commented 5 years ago

@mortenwh & @akorosov Could you tell me please, how should I describe a platform/instrument metadata in case of combined product? Shall I list all instruments? Or is there a specification for this type of products?

metadata from the file: source_data: AMSRE,ATS_NR_2P,AVHRR18_G,AVHRR17_NAR,AVHRR18_NAR,OSISAF_ICE,SEVIRI,TMI

mortenwh commented 5 years ago

The best solution is perhaps to add it as list, as you suggest. This should work fine for Nansat, but I am not sure how we should solve it for Geo-SPaaS. Currently, I think there is only one source per dataset in the Geo-SPaaS catalog. There is a related issue in Django-Geo-SPaaS which should be solved as well: https://github.com/nansencenter/django-geo-spaas/issues/1

korvinos commented 5 years ago

@mortenwh As far as I can see, we usually specify platform/instrument separately but not a source. So in the case of different sources, there will be a set of platforms and a set of instruments are not connected with each other. Therefore, since it is not going to work with GS anyway, I think that I will have to skip this metadata.

korvinos commented 5 years ago

I have just looked through a mapper_opendap_globcurrent_thredds.py and found that i can define platform and instrument through a generic name for now:

        mm = pti.get_gcmd_instrument('Passive Remote Sensing')
        ee = pti.get_gcmd_platform('Earth Observation Satellites')

UPD: I cannot do that since the dataset is not only from remote sensing observations

mortenwh commented 5 years ago

source is the table name in geospaas... Anyway, platform and instrument come in pairs and it may be complicated to have more than one pair in the metadata without major changes. So more generic names are probably good.

mortenwh commented 5 years ago

I will introduce a hack for OSTIA data in the nansat mapper mapper_opendap_ostia: mapper_opendap_ostia sets platform to Aqua and instrument to AMSR-E.

The issue with multiple sources will then be followed up in https://github.com/nansencenter/django-geo-spaas/issues/1

mortenwh commented 5 years ago

I suggest to solve this by setting platforms and instruments in a list of lists:

pi = [
    [pti.get_gcmd_platform('noaa-18'), pti.get_gcmd_instrument('avhrr-3')],
    [pti.get_gcmd_platform('noaa-19'), pti.get_gcmd_instrument('avhrr')]
]
self.dataset.SetMetadataItem('platform/instrument', json.dumps(pi))

Django-Geo-SPaaS could be modified to accept both separate metadata fields (i.e., platform and instrument) and combined fields (i.e., platform/instrument). As such, we don't need to change mappers for datasets from single instruments.

mortenwh commented 5 years ago
mortenwh commented 5 years ago

Note: Not all globcurrent products provide the source platforms and instruments. In that case a warning is issued, advising the user to create a github issue.

mortenwh commented 5 years ago

@akorosov - this is merged and can be closed, right?