ideas-lab-nus / epwshiftr

Create future EnergyPlus Weather files using CMIP6 data
https://ideas-lab-nus.github.io/epwshiftr/
Other
30 stars 7 forks source link

Different `dataset_id` could link to the same dataset #59

Open hongyuanjia opened 1 year ago

hongyuanjia commented 1 year ago

dataset_id could not be used as the unique identifier of the dataset. It is specific to data node. This did not cause any problems for esgf_query(), but did result in duplicated entries in the results of init_cmip6_index() when replica is set to TRUE. Should use dataset_pid as the unique dataset identifier when building index.

q <- epwshiftr::esgf_query(
    activity = "ScenarioMIP",
    variable = "tas",
    frequency = "day",
    experiment = "ssp585",
    source = "AWI-CM-1-1-MR",
    variant = "r1i1p1f1",
    replica = TRUE,
    latest = TRUE,
    resolution = "100 km",
    limit = 10000L,
    data_node = NULL
)

q[, .(dataset_id, dataset_pid)]
#>                                                                                        dataset_id
#> 1:   CMIP6.ScenarioMIP.AWI.AWI-CM-1-1-MR.ssp585.r1i1p1f1.day.tas.gn.v20190529|esgf-data1.llnl.gov
#> 2: CMIP6.ScenarioMIP.AWI.AWI-CM-1-1-MR.ssp585.r1i1p1f1.day.tas.gn.v20190529|esgf-data3.diasjp.net
#> 3:       CMIP6.ScenarioMIP.AWI.AWI-CM-1-1-MR.ssp585.r1i1p1f1.day.tas.gn.v20190529|esgf.ceda.ac.uk
#> 4:       CMIP6.ScenarioMIP.AWI.AWI-CM-1-1-MR.ssp585.r1i1p1f1.day.tas.gn.v20190529|esgf.nci.org.au
#>                                          dataset_pid
#> 1: hdl:21.14100/a336f13f-a4d3-3b57-a45a-8f27f0ba01b8
#> 2: hdl:21.14100/a336f13f-a4d3-3b57-a45a-8f27f0ba01b8
#> 3: hdl:21.14100/a336f13f-a4d3-3b57-a45a-8f27f0ba01b8
#> 4: hdl:21.14100/a336f13f-a4d3-3b57-a45a-8f27f0ba01b8

unique(q[, -c("dataset_id", "data_node")])
#>    mip_era activity_drs institution_id     source_id experiment_id member_id
#> 1:   CMIP6  ScenarioMIP            AWI AWI-CM-1-1-MR        ssp585  r1i1p1f1
#>    table_id frequency grid_label  version nominal_resolution variable_id
#> 1:      day       day         gn 20190529             100 km         tas
#>              variable_long_name variable_units
#> 1: Near-Surface Air Temperature              K
#>                                          dataset_pid
#> 1: hdl:21.14100/a336f13f-a4d3-3b57-a45a-8f27f0ba01b8

Created on 2022-09-19 with reprex v2.0.2

hongyuanjia commented 1 year ago

Ref: [Identifiers](Returned Metadata Fields)