wmo-im / wcmp

WMO Core Metadata Profile
https://github.com/wmo-im/wcmp
6 stars 6 forks source link

assess the WIS 1.0 catalogue for availability of NWP data #103

Open tomkralidis opened 3 years ago

tomkralidis commented 3 years ago

cc @efucile

As a result of the forthcoming WMO data policy, assess WIS 1.0 catalogue to determine the availability of NWP data on WIS.

tomkralidis commented 3 years ago

Update 2021-05-21: @efucile will provide list of centers providing NWP.

@josusky and @tomkralidis to develop MVP.

efucile commented 3 years ago

@tomkralidis and @josusky this is a list of GDPFS centres. There is a column saying which of them is supposed to provide data in WIS. This can be used to check which of the centres in the list is also providing data in the catalogue. GDPFS Centers

josusky commented 3 years ago

Just for the record,, the whole WIS metadata catalogue can be downloaded from https://gisc.dwd.de/oaidownload/wis-catalogue.tar.gz. I guess that can be considered and authoritative source :-)

tomkralidis commented 3 years ago

Thanks @josusky. To clarify, is this the result of DWD GISC harvest of all GISCs? How often is the .tar.gz updated? cc @jsieland

jsieland commented 3 years ago

To clarify, is this the result of DWD GISC harvest of all GISCs? How often is the .tar.gz updated? cc @jsieland

Yes, it's everything. At https://gisc.dwd.de/wisportal/# (see "Metadata") you can choose between all or only GISC Toulouse or only GISC Offenbach. Should be updated every night.

josusky commented 3 years ago

I am pasting here part of the discussion with Tom over Slack: When I go to https://gisc.dwd.de/wisportal/# and search for "NWP model GRIB" I get (allegedly) 110978 results. A more machine friendly approach is to use SRU: https://sru.dwd.de/SRU2JDBC/sru?operation=searchRetrieve&version=1.1&startRecord=1&maximumRecords=5&query=title%20=%20model%20and%20abstract%20=%20NWP&stylesheet=xsl/dwd-sru.xsl&x-dwd-stylesheetDetailLevel=1 this surprisingly finds only 2985 matches (the request definitely needs tuning). Such request could be done from a script. The number of responses per requests is usually limited so we would need to repeat the request. The other alternative is to run a script locally on top of the catalogue retrieved as the *.tar.gz. That might be faster but requires own implementation of a search algorithm.

tomkralidis commented 3 years ago

Aside: I've captured the WIS catalogue location information in the wiki for future use: https://github.com/wmo-im/wcmp/wiki/WISMetadataCatalogue#wis-10-metadata-catalogue

Please feel free to update.

tomkralidis commented 3 years ago

An initial implementation can be found in https://github.com/wmo-im/pywiscat; @josusky to review/update accordingly, at which point we'll be able to provide an analysis of NWP data.

antje-s commented 3 years ago

Different search result numbers occure due to differences in search query https://sru.dwd.de/SRU2JDBC/sru?operation=searchRetrieve&version=1.1&startRecord=1&maximumRecords=5&query=title%20=%20model%20and%20abstract%20=%20NWP&stylesheet=xsl/dwd-sru.xsl&x-dwd-stylesheetDetailLevel=1 --> title=model AND abstract=NWP

Search over https://gisc.dwd.de/ "NWP model GRIB" --> 111 023 matches searches over all indexed fields for "NWP" OR "model" OR "GRIB" Search "NWP AND model AND GRIB" (but still over all indexed fields) --> 24 957 matches

[SolR-supported operators: AND – alternative symbol: &&, NOT – alternative symbol: !, OR – alternative symbol: || [DEFAULT]]

Of interest could possibly also be our new REST API in the WIS Portal. A first version of a doc is available at https://gisc-test.dwd.de/restapi.html if you have any questions, we will be happy to help you.

antje-s commented 3 years ago

COR: /search/startSearch --> GET

josusky commented 3 years ago

Hi, a first very crude test (looking only for the word "GRIB") yielded the following result (in the JSON output non-ASCII characters are encoded):

{
    "NMC FRANCE - M\u00e9t\u00e9o-France": 1531,
    "NMC UNITED KINGDOM - Met Office": 15069,
    "ECMWF": 2755,
    "GISC Tokyo - Japan Meteorological Agency": 21372,
    "Max-Planck-Institut fuer Meteorologie": 46,
    "Deutscher Wetterdienst": 1246,
    "National Meteorological Information Center, CMA": 126,
    "WMO Lead Centre for Long-Range Forecast Multi-Model Ensemble": 120,
    "FSBE \u00abAviamettelecom of Roshydromet\u00bb": 2182,
    "Deutsches Klimarechenzentrum": 8,
    "Japan Meteorological Agency": 193,
    "OSI SAF": 28,
    "European Centre for Medium-Range Weather Forecasts": 7,
    "Forschungszentrum Karlsruhe": 3,
    "Commonwealth Scientific & Industrial Research Organisation": 4,
    "University of Hohenheim": 2,
    "Canadian Centre for Climate Modelling and Analysis": 2,
    "NOAA": 1,
    "EUMETSAT": 25,
    "H SAF": 14,
    "WMO/WIS/GISC Tokyo": 10,
    "Deutscher Wetterdienst (RD)": 2,
    "Institute for Meteorology, Freie Universit\u00e4t Berlin": 1,
    "DCPC-Adriatic Marine Meteorological Centre": 8,
    "ZAMG - Central Institute for Meteorology and Geodynamics": 2,
    "Agenzia Regionale Preventione e Ambiente dell'Emilia-Romagna": 2,
    "Deutscher Wetterdienst (ZAK)": 2,
    "CNMCA (Pratica di Mare)": 2,
    "NMC BULGARIA - National Institute of Meteorology and Hydrology": 3,
    "University of Toulouse": 2,
    "Max-Planck-Institut fuer Meteorologie (MD)": 5,
    "National Institute for Environmental Studies": 6,
    "Met Office Hadley Centre": 4,
    "Institut f\u00fcr Meteorologie der Freien Universit\u00e4t Berlin": 3,
    "Instituto Nacional de Meteorologia": 1,
    "Centre National de Recherches M\u00e9t\u00e9orologiques": 2,
    "Istituto Superiore per la Protezione e la Ricerca Ambientale (ex APAT)": 2,
    "Met Office": 2,
    "Institute of Atmospheric Sciences and Climate": 2,
    "Meteo-France": 1,
    "Geophysical Fluid Dynamics Laboratory/NOAA": 2,
    "Environment Canada": 2,
    "WMO/WIS/DCPC Tokyo (Global Producing Centre for long-range forecast)": 1,
    "Federal Office of Meteorology and Climatology MeteoSwiss": 2,
    "South East European Virtual Climate Change Center (SEEVCCC)": 3,
    "ARPA-Servizio IdroMeteorologico": 2,
    "Agenzia Regionale per la Protezione dell'Ambiente Ligure": 1,
    "Helmholtz-Zentrum Geesthacht, Zentrum f\u00fcr Material- und K\u00fcstenforschung GmbH": 1,
    "WMO": 1,
    "DCPC Rome (RTH)": 1,
    "Czech hydrometeorological institude": 1,
    "NMC KENYA - Kenya Meteorological Department": 1
}

this shows several potential issues. Several centers are listed more than once, for example, "ECMWF" is obviously the same thing as "European Centre for Medium-Range Weather Forecasts". And there is granularity again. JMA published 21372 records while DWD "only" 1246, but that does not mean that JMA is doing its job by an order of magnitude better :-)

josusky commented 3 years ago

@efucile , sorry for the delay. I have run an updated version of pywiscat (that groups the output by citation authority extracted from the file identifier URI) after our last teleconference but forgot to publish the result. Here it is (this time UTF-8 encoded, thus the non-ASCII characters are more readable):

   "" : {
      "ARPA-Servizio IdroMeteorologico" : 2,
      "Agenzia Regionale Preventione e Ambiente dell'Emilia-Romagna" : 2,
      "Agenzia Regionale per la Protezione dell'Ambiente Ligure" : 1,
      "CNMCA (Pratica di Mare)" : 2,
      "Canadian Centre for Climate Modelling and Analysis" : 2,
      "Centre National de Recherches Météorologiques" : 2,
      "Commonwealth Scientific & Industrial Research Organisation" : 4,
      "Deutscher Wetterdienst" : 1,
      "Deutscher Wetterdienst (RD)" : 2,
      "Deutscher Wetterdienst (ZAK)" : 2,
      "Deutsches Klimarechenzentrum" : 8,
      "Environment Canada" : 2,
      "Federal Office of Meteorology and Climatology MeteoSwiss" : 2,
      "Forschungszentrum Karlsruhe" : 3,
      "Geophysical Fluid Dynamics Laboratory/NOAA" : 2,
      "Helmholtz-Zentrum Geesthacht, Zentrum für Material- und Küstenforschung GmbH" : 1,
      "Institut für Meteorologie der Freien Universität Berlin" : 3,
      "Institute for Meteorology, Freie Universität Berlin" : 1,
      "Institute of Atmospheric Sciences and Climate" : 2,
      "Instituto Nacional de Meteorologia" : 1,
      "Istituto Superiore per la Protezione e la Ricerca Ambientale (ex APAT)" : 2,
      "Max-Planck-Institut fuer Meteorologie" : 46,
      "Max-Planck-Institut fuer Meteorologie (MD)" : 5,
      "Met Office Hadley Centre" : 4,
      "National Institute for Environmental Studies" : 6,
      "University of Hohenheim" : 2,
      "University of Toulouse" : 2,
      "ZAMG - Central Institute for Meteorology and Geodynamics" : 2
   },
   "cn.cma.wmc-bj" : {
      "National Meteorological Information Center, CMA" : 126
   },
   "cz.chmi.dcpc" : {
      "Czech hydrometeorological institude" : 1
   },
   "de.dwd.gpc" : {
      "Deutscher Wetterdienst" : 16
   },
   "fr.meteo" : {
      "NMC FRANCE - Météo-France" : 369
   },
   "fr.meteo.dcpc-copernicus" : {
      "ECMWF" : 905
   },
   "fr.meteo.dcpc-eer" : {
      "NMC FRANCE - Météo-France" : 7
   },
   "fr.meteo.dcpc-lrf" : {
      "NMC FRANCE - Météo-France" : 2
   },
   "fr.meteo.dcpc-nwp" : {
      "NMC FRANCE - Météo-France" : 64
   },
   "hr.ammc.dcpc" : {
      "DCPC-Adriatic Marine Meteorological Centre" : 8
   },
   "int.ecmwf" : {
      "ECMWF" : 116,
      "European Centre for Medium-Range Weather Forecasts" : 7
   },
   "int.eumetsat" : {
      "ECMWF" : 2,
      "EUMETSAT" : 25,
      "H SAF" : 14,
      "Met Office" : 2,
      "Meteo-France" : 1,
      "NOAA" : 1,
      "OSI SAF" : 28,
      "WMO" : 1
   },
   "int.wmo.wis" : {
      "Deutscher Wetterdienst" : 1229,
      "ECMWF" : 1732,
      "FSBE «Aviamettelecom of Roshydromet»" : 2182,
      "GISC Tokyo - Japan Meteorological Agency" : 21356,
      "Japan Meteorological Agency" : 193,
      "NMC BULGARIA - National Institute of Meteorology and Hydrology" : 3,
      "NMC FRANCE - Météo-France" : 1089,
      "NMC KENYA - Kenya Meteorological Department" : 1,
      "NMC UNITED KINGDOM - Met Office" : 15069
   },
   "it.meteoam.dcpc" : {
      "DCPC Rome (RTH)" : 1
   },
   "jp.go.jma.wis.dcpc-geogr" : {
      "GISC Tokyo - Japan Meteorological Agency" : 16
   },
   "jp.go.jma.wis.dcpc-gpc" : {
      "WMO/WIS/DCPC Tokyo (Global Producing Centre for long-range forecast)" : 1,
      "WMO/WIS/GISC Tokyo" : 5
   },
   "jp.go.jma.wis.dcpc-sat" : {
      "WMO/WIS/GISC Tokyo" : 1
   },
   "jp.go.jma.wis.dcpc-tcc" : {
      "WMO/WIS/GISC Tokyo" : 4
   },
   "org.wmolc" : {
      "WMO Lead Centre for Long-Range Forecast Multi-Model Ensemble" : 120
   },
   "rs.gov.hidmet" : {
      "South East European Virtual Climate Change Center (SEEVCCC)" : 3
   }
}