wengxyu1030 / UHC-Time-Trend

This repository is to build time series for UHC indicators by country, year, survey.
0 stars 0 forks source link

Next Steps on Quality Control: Missing Data and Flagged Data Points. (DHS) #3

Open wengxyu1030 opened 3 years ago

wengxyu1030 commented 3 years ago

Dear Jianing,

Thanks for the meeting today. As discussed, I am sharing the materials to conduct the further quality control process. All the relevant dofiles and data has been updated in the repository: https://github.com/wengxyu1030/UHC-Time-Trend.

The files that would be helpful to you are:

• The collapsed country-year level indicator has been generated and saved in the reference data folder named DHS_Time_SeriesQC.dta, using the dofiles named 01 02_.

• The datapoints flagged has been identified and saved in the reference data folder as qualitycontrol.dta using the dofiles named 03.

• An excel file named Surveys_DW_0724.xlsx that specify surveys that the Data Whale team has worked on.

It would be great if you could help on the following tasks:

  1. Using the data file quality_control.dta to identify the missing datapoints that exist in DHS or HEFPI public data (variable value_dhs value_hefpi) but not in the DW team generated indicator (variable value_my). The deliverable could be another dofile and dta file with the identified missing data point, and the survey characteristics (country, year, survey type), kindly note these information could be found in the DHS_Time_Series_QC.dta. Once done please update them to the repository as pull request.

  2. Using the data file quality_control.dta to identify the cases where the flags for both DHS and HEFPI are raised (flag_dhs == 1, flag_hefpi == 1) and check the repository issues (DHS, AIS, MIS) accordingly, summarize what was the team's feedback. The deliverable could be an excel file with the link to the repo and briefly summarizing the feedback/reaction of the team members.

Please let me know if there's further questions.

Regards, Aline

wengxyu1030 commented 3 years ago

The pull request linked to missing data checking is #2 created by Jianing.

wengxyu1030 commented 3 years ago

The missing data points have been identified as below: image

Where only survey LB2016MIS is in the scope of DW. Dear @robin-wang kindly confirm if I am right.

For this issue that was identified in survey LB2016MIS, it has been identified by the DW team member and a decision has been made to consult with the WB team leader later. The pull request documented the conversation: https://github.com/wengxyu1030/MIS/pull/26

wengxyu1030 commented 3 years ago

@wengxyu1030 To adjust the dofile directories once this issue is closed to make sure anyone with access to the OneDrive shared folder could replicate the work.

jianingwwww commented 3 years ago

Dear @wengxyu1030 ,

I am moving forward with step2 and there are two questions I would like to ask for your guidance.

  1. Do these flagged datapoints have to satisfy the condition that both flag_dhs and flag_hefpi == 1? As for those only one flag is raised (flag_dhs == 1 or flag_hefpi == 1), do I need to have any follow-up action?

  2. I can successfully collect information in the AIS and MIS repositories, but it seems no access to most of the DHS surveys in the DHS repository you shared with me (as far as I can see, only Senegal 2018 and Senegal 2019 in it). Would you kindly guide me on how to find these surveys' info?

Thanks a lot in advance!

Best, Jianing

wengxyu1030 commented 3 years ago

Dear @jianingwwww ,

Thanks for the feedback. Please see my responses below:

  1. Let's prioritize the data points with both flags raised. We can later move to those with a single flag raised if necessary.
  2. Thanks for pointing this out, the scope of the work for DW on DHS is later adjusted to two surveys only for DHS but I failed to reflect this in the excel file Surveys_DW_0724.xlsx. I will adjust accordingly.

Regards, Aline

jianingwwww commented 3 years ago

Dear @wengxyu1030 ,

a3fbbd94107abec46e275fdb0165a57

According to Quality_control_result, these surveys’ conditions are required to be recorded. But so far I still can't find the corresponding issues or pull requests from the DHS repository you shared with me earlier (I suppose my job is to find the team's feedbacks from the issue and record them on excel?)Could you please give further guidance? Thanks a lot in advance.

Best, Jianing

wengxyu1030 commented 3 years ago

Hi @jianingwwww ,

Those DHS files you can not see in the repo because DW team did not code them. Please park them for now and focus on those coded by DW.

Thanks for pointing out the that the cases where both flags are missing are all DHS microdata that could not find in the repo.

Let's move to the next step for the cases where one flag is missing and another is coded as 1. Suggest using code: br if ((flag_hefpi == 1 & flag_dhs == .) | (flag_hefpi == . & flag_dhs == 1) | (flag_hefpi == 1 & flag_dhs == 1) ), and focus on those coded by DW (AIS, MIS, and two DHS surveys listed in the Surveys_DW_0724.xlsx)

Regards, Aline

wengxyu1030 commented 3 years ago

Thanks @jianingwwww for her detailed review, now the excel file has been updated in the pull request #4 that identified cases where both the HEFPI and DHS flags were raised or the only one available quality checking reference is flagged.

The detailed reaction for each issue is documented and the reference link to the issue/pull request are listed. In most of the cases the DW team responded and concluded, however, there are cases where no reactions. I would invite the RA who were working on them look further on them.

Below is the list of cases where there were no reaction. Dear @robin-wang , please follow up on those issue.

image

robin-wang commented 2 years ago

Circling back on this issue with updates from touchbase with WB team. We are to create alternative flagging criteria, and feed them into dashboard with marks. Changes required will therefore be determined in the time series instead. Update time -0305