Use it to create a function remove_nonCountries so that any data frame, say df_example with iso2c in it can do the function call as the following to remove those non-country entries.
The following code remove non countries from the data set and narrow down further to year 2020 data. Then summarise the indicator's mean, median, and range
wdi$data |> remove_nonCountries() -> data_set
data_set |> subset(year==2020) -> data_set2020 # it is the same as
code = "SG.GEN.PARL.ZS"
{
data_set2020[[code]] |> range(na.rm=T) -> output_range
data_set2020[[code]] |> mean(na.rm=T) -> output_mean
data_set2020[[code]] |> median(na.rm=T) -> output_median
list(
mean=output_mean,
median=output_median,
range=list(output_range)
) |> list2DF()
}
Construct a function `summarise_numerical` which can be used to produce a summary data frame of mean, median, and range for any given data set (as input argument `data_set`) and a numerical feature column name (as input argument `feature`). In other words, with the help of `summarise_numerical` function, the above code chunk can be replace with
```{r}
wdi$data |> remove_nonCountries() -> data_set
data_set |> subset(year==2020) -> data_set2020 # it is the same as
code = "SG.GEN.PARL.ZS"
summarise_numerical(data_set=data_set2020, feature=code)
Gender inequality is an important issue in social science. One possible indicator to compare this inequality across countries is:
Proportion of seats held by women in national parliaments (%) (code name is "SG.GEN.PARL.ZS").
What is the year range in the data set? For each year compute the mean of this indicator across countries. Is the trend of mean increasing over time?
Create a function get_meanTrendOverYears when do the following function call, it will return a vector of the mean of all countries' given code feature value over the years, with years as element names. (That is if mean is 2, 3, 8 for year 2010, 2011, 2012, then the returned vector should be the named numeric vector c("2010"=2, "2011"=3, "2012"=8).)
Import the
wdi
data from 4.8 Exercise-5 and obtainiso2c_nonCountry
from Exercise 4.19Use it to create a function
remove_nonCountries
so that any data frame, saydf_example
withiso2c
in it can do the function call as the following to remove those non-country entries.data_set |> subset(year==2020) -> data_set2020 # it is the same as code = "SG.GEN.PARL.ZS" { data_set2020[[code]] |> range(na.rm=T) -> output_range data_set2020[[code]] |> mean(na.rm=T) -> output_mean data_set2020[[code]] |> median(na.rm=T) -> output_median list( mean=output_mean, median=output_median, range=list(output_range) ) |> list2DF() }
Gender inequality is an important issue in social science. One possible indicator to compare this inequality across countries is:
What is the year range in the data set? For each year compute the mean of this indicator across countries. Is the trend of mean increasing over time?
Create a function
get_meanTrendOverYears
when do the following function call, it will return a vector of the mean of all countries' given code feature value over the years, with years as element names. (That is if mean is 2, 3, 8 for year 2010, 2011, 2012, then the returned vector should be the named numeric vectorc("2010"=2, "2011"=3, "2012"=8)
.)