This issue proposes a consolidated approach to structuring DCAT query results such that each dataset that is returned by DCAT includes a full picture of the dataset, including all available information about each variable and each variable's associated standard variable information.
Current State
In order to gain a full picture of a dataset, you currently must:
Query for the dataset by ID to obtain its variables
Query for the standard variables related to the dataset
For each variable, query for the associated standard variable
This will return a dataset and for each variable within the dataset, the variable name, metadata, and a variable ID, but not standard variable information.
So next, we must find standard variables associated with the dataset:
resp = requests.post(f"{url}/datasets/dataset_standard_variables",
headers=request_headers,
json=q).json()
std_vars = resp['dataset']['standard_variables']
std_vars_dict = {}
for var in std_vars:
std_vars_dict[var['standard_variable_id']] = var
This query returns the standard variables associated with a dataset, but not the variables each of those standard variables relates to. We have stored each standard variable's information (name and URI) in a lookup dictionary where we can pull the information on the standard variable using its ID as a key.
Now, for each variable within the dataset, we have to find its associated standard variables:
for v in dataset['variables']:
# Obtain standard name information for the variable
q = {
"variable_ids__in": [v['variable_id']]
}
resp = requests.post(f"{url}/variables/variables_standard_variables",
headers=request_headers,
json=q).json()
v_ = resp['variables'][0]
v['standard_names'] = []
for std in v_['standard_variables']:
std_var = std_vars_dict[std['standard_variable_id']]
v['standard_names'].append(std_var)
Finally, this outputs a consolidateddataset which includes variable information and each variable's associated standard variable information:
Overview
This issue proposes a consolidated approach to structuring DCAT query results such that each
dataset
that is returned by DCAT includes a full picture of thedataset
, including all available information about each variable and each variable's associated standard variable information.Current State
In order to gain a full picture of a
dataset
, you currently must:dataset
by ID to obtain its variablesdataset
Example of Current State Process
For example, we can first query for a dataset.:
This will return a
dataset
and for each variable within the dataset, the variable name, metadata, and a variable ID, but not standard variable information.So next, we must find standard variables associated with the dataset:
This query returns the standard variables associated with a dataset, but not the variables each of those standard variables relates to. We have stored each standard variable's information (name and URI) in a lookup dictionary where we can pull the information on the standard variable using its ID as a key.
Now, for each variable within the
dataset
, we have to find its associated standard variables:Finally, this outputs a consolidated
dataset
which includes variable information and each variable's associated standard variable information: