Closed billwambua closed 3 weeks ago
Found 1568/6396 matches for the time period 2021-03-01
to 2021-03-03
for the get_hmis_data
function using hash matches.
For 2021-01-01
to 2021-01-30
, 1220 out of 6396 records match.
For Q1 2021(2021-01-01 to 2021-01-31), the matching records are 4076 out of 19188.
@wambua, it's likely this was cloned from brac.dev. Theres a dump of the couchdb table in brac-rdbms.dev under /srv/backup/dump/
that you can use to restore into the cht-pipeline database.
After importing the new brac_ug
DB 9194 out of 19194 records match, these are the results:
SELECT t.chp_uuid, t.date, md5((ROW(t.*)::TEXT)) as hash
FROM get_hmis_data('2021-01-01', '2021-03-31') t ;
9194 out of 19194 records from Brac-ug DB match cht_pipeline_test dbt DB
Time taken: 5272.167739152908
PostgreSQL connection is closed
It would be helpful if you also checked whether they returned the same number of rows
19188
records for cht_pipeline_test vs 19194
records for Brac. Difference of 6 rows.
The problem is these columns
num_of_hh_bed_nets
num_of_hh_latrines
num_of_hh_safe_water
Which are empty in the Brac DB pull but have data when pulled from CHT-DataPipeline
All the other columns in the get_hmis_data
pull match except those three.
LEFT JOIN
(
SELECT
chw,
area_uuid,
date_trunc('month',reported) :: date AS reported_month,
SUM((mosquito_nets ='true')::int) AS num_of_hh_bed_nets,
SUM((hygeinic_toilet ='true')::int) AS num_of_hh_latrines,
SUM((COALESCE(source_of_drinking_water,'') != 'spring')::int) AS num_of_hh_safe_water
FROM dbt.useview_household_survey
GROUP BY
chw,
area_uuid,
reported_month
) AS family_survey ON(period_chp.date = family_survey.reported_month AND period_chp.area_uuid =family_survey.area_uuid)
Trying to see why it is behaving like that
Nice find @billwambua! (i hope the column comparisons are scripted to make it easier to do the same across the other datasets) That would be weird since the same data is being used across.
Could you check the definition of what is actually running on postgres and compare that with what's on github?
@derickl checking. Since we did some modifications on the function, I copied the same exact function which is on Brac to CHT-Pipeline test and did a pull - the data from those indicated columns again did not match.
I checked the underlying useview for 1 chv and the data matches, but the function pull does not match.
I'm still trying to understand why it does not match when pulled via the function.
Got it: The definition of useview_household_survey
in brac-ug is different from what is on Github.
On Github and consequently on the cht_pipeline_test
config, the definition of the fields chw
and area_uuid
is:
doc #>> '{parent,contact,_id}'::text[] AS chw,
doc #>> '{parent,_id}'::text[] AS area_uuid,
While on the brac-ug DB, it is defined as
doc #>> '{parent,contact,_id}'::text[] AS chw,
doc #>> '{parent,contact,parent,_id}'::text[] AS area_uuid,
Notice the difference in area_uuid
.
This is causing the Brac-DB to show empty fields for the area_uuid
field which is used in joins in get_hmis_data
and so the num_of_hh_bed_nets
and num_of_hh_latrines
and num_of_hh_safe_water
fields are all empty when pulled from brac-ug database.
@jonathanbataire you need to update the useview_household_survey
materialized view to use the latest config on Github.
As for get_assessment_data
, for Q1 2021, 7213 out of 9737 records match.
SELECT t.chw_uuid, t.month, md5((ROW(t.*)::TEXT)) as hash
FROM get_assessment_data('2021-01-01', '2021-03-03') t ;
7213 out of 9737 records from Brac-ug DB match cht_pipeline_test dbt DB
Time taken: 2695.9290194511414
Issue seems to be the column percent_mrdt
which is rendered as a float. cht_pipeline_test is giving 16 decimal places while brac-ug is giving 15 decimal places. e.g for chw_uuid
- ff7a5739-53f4-41c5-99d3-f8eeb2f6da92
and month
- 202101
, the return is:
0.307692307692308 (brac-ug)
0.3076923076923077 (cht_pipeline_test)
If I do the below query on both brac-ug and cht_pipeline_test servers, I get 0.3076923076923077
for both. So I'm a bit stumped as to why this is happening.
select ((4+0)::float/(13+0+0)::float)::float
The code excerpt that does the division is this:
CASE
WHEN (COALESCE(ASSESS.mrdt_positive,0) + COALESCE(ASSESS.mrdt_negative,0) + COALESCE(ASSESS.mrdt_none,0)) = 0
THEN 0::float
ELSE
(COALESCE(ASSESS.mrdt_positive,0) + COALESCE(ASSESS.mrdt_negative,0))::float / (COALESCE(ASSESS.mrdt_positive,0) + COALESCE(ASSESS.mrdt_negative,0) + COALESCE(ASSESS.mrdt_none,0))::float
END AS percent_mrdt,
For get_dashboard_data
, 0 rows match exactly.
Picking one row and doing investigations, i.e Kibuli Q1 2021, I used the Pregnancies Registered
data point. It uses the MAX(EDD)
to determine the latest pregnancy. The issue with this is if say a mother has a pregnancy with LMP early Year 1 with delivery in late Year 1 then has another pregnancy in mid Year 2, then only the second pregnancy will be counted.
That is the issue with using MAX
only without a JOIN to the corresponding Delivery and making that a group.
If you do the below query for patient Atuhaire Diana
with patient_id 6bf02d77-c47e-4c30-ae6f-811fb94cdf47
in brac-ug
.
SELECT
chw AS CHW_UUID,
patient_id AS patient_id,
date(edd) AS EDD
FROM
useview_pregnancy
WHERE patient_id = '6bf02d77-c47e-4c30-ae6f-811fb94cdf47'
You get:
chw_uuid patient_id edd
0b18c388-4eb8-432f-92f6-c7bf35d13ef8 6bf02d77-c47e-4c30-ae6f-811fb94cdf47 2023-02-15
0b18c388-4eb8-432f-92f6-c7bf35d13ef8 6bf02d77-c47e-4c30-ae6f-811fb94cdf47 2021-02-17
But the below query to check for Q1 2021 gives a blank.
SELECT
chw AS CHW_UUID,
patient_id AS patient_id,
max(date(edd)) AS EDD
FROM
useview_pregnancy
WHERE patient_id = '6bf02d77-c47e-4c30-ae6f-811fb94cdf47'
GROUP BY
CHW_UUID,
PATIENT_ID
HAVING
max(date(edd)) >= (date_trunc('day','2021-01-01'::TIMESTAMP)) AND
CASE
WHEN date_trunc('day','2021-03-31'::TIMESTAMP) = date_trunc('day',now())
THEN max(date(edd)) <= ('2021-03-31'::TIMESTAMP - '3 days'::interval)
ELSE max(date(edd)) <= ('2021-03-31'::TIMESTAMP + '1 day'::interval)
END
@njuguna-n is this still relevant? Can I close it?
@Phil-Mwago, when you re-open issues, please change the status to TODO
in the board, as that is not done automatically.
Compare results from the brac config in rdbms vs the dbt config in
http://cht-pipeline-test.caseqzeqqeog.eu-west-2.rds.amazonaws.com/
.Compare results for:
They should be the same.