yng-me / mpindex

Multidimensional Poverty Index (MPI): R Package
https://yng-me.github.io/mpindex/
Other
3 stars 0 forks source link

How to define the cutoff threshold? #17

Closed seb-garcia closed 8 months ago

seb-garcia commented 10 months ago

Hello from Peru I am very interested in the work you have done with the MPI. It has been helping us so much on a report we are writing to assess living conditions of Venezuelan refugees and migrants in Peru.

I am trying to implement your package into a NSO survey applied to Venezuelan population (ENPOVE 2022)* in Peru. Our repository is at this github Repo.

I am facing a problem on how the cutoff threshold works. First, I have tried to use a condition using dplyr grammar:

deprivation_profile$year_schooling <- df_household_roster |>
  define_deprivation(
    .indicator = year_schooling,
    .cutoff = (P501 < 6 | P501B < 6  ) & P205_A>17,
    .collapse = TRUE
  )

And I get this error message:

Warning: There were 1140 warnings in `dplyr::summarise()`.
The first warning was:
ℹ In argument: `Years of schooling = max(`Years of schooling`, na.rm = T)`.
ℹ In group 17: `uuid = "00038002381"`.
Caused by warning in `max()`:
! no non-missing arguments to max; returning -Inf
ℹ Run `dplyr::last_dplyr_warnings()` to see the 1139 remaining warnings.

However when I do:

deprivation_profile$year_schooling <- df_household_roster |>
  define_deprivation(
    .indicator = year_schooling,
   # .cutoff = (P501 < 6 | P501B < 6) & P205_A>17,
    .cutoff = year_schooling == 0, 
    .collapse = TRUE
  )

The code works just right.

Do you know what the issue is? Could you help us figure out what is the problem?

Moreover, I assume after I get the deprivation_matrix as an output I can apply a weighting vector to extrapolate the results. Would it work?

Thank you so much for your help!

*Microdata can be downloaded from INEI's website: For Household

For household members

yng-me commented 10 months ago

Hi, @seb-garcia. On your first question, the condition P501 < 6 | P501B < 6) & P205_A > 17 seems to return NA for some households. Here's my workaround but you should decide on how to treat these NAs. In my case, I coerced NA equal to 0.

deprivation_profile$year_schooling <- df_household_roster |>
  mutate(deprived_year_schooling = if_else((P501 < 6 | P501B < 6) & P205_A > 17, 1, 0, 0)) |> 
  define_deprivation(
    .indicator = year_schooling,
    .cutoff = deprived_year_schooling == 1,
    .collapse = TRUE
  )

I will include additional argument in define_deprivation on how to treat NA as a result of evaluating the deprivation cutoff, so you don't need to do extra steps doing data transformation. Watch out for the next release.

yng-me commented 10 months ago

On your other query, yes, you can definitely apply a weighting vector to the deprivation_matrix object returned by using compute_mpi. Please note, though, that compute_mpi is already applying the weights that you define in your specification file under the hood.

yng-me commented 8 months ago

See #18