Closed rbroth closed 2 years ago
@fannymawbey , questions about the dataset:
Le jeu. 29 avr. 2021 à 10:19, Roman @.***> a écrit :
@fannymawbey https://github.com/fannymawbey , questions about the dataset:
- what is the name? "Malawi DHS survey"?
- Do you have information about the household that these people live in? We assume that every person surveyed lives in ahousehold and that sometimes multiple people live in the same household.
- Has the data been adjusted for altitude and smoking?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/micronutrientsupport/database-architecture/issues/142#issuecomment-829075437, or unsubscribe https://github.com/notifications/unsubscribe-auth/AO7ZVATHRWOLGNMNGUF5QFDTLEQAHANCNFSM43VDYCGA .
It's the result of malaria testing with a Rapid Test Kit. It measures antigens in the blood at the time of the test, but doesn't tell you when the person got infected.
There is another question that is self reported - 'Did you have malaria in the last 2 weeks?', but this is different and we try to avoid using this one if we actually have a test result.
Le jeu. 29 avr. 2021 à 10:35, Roman @.***> a écrit :
- MALARIA_TEST_RESULT - does this mean that the person has had malaria in the last two weeks?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/micronutrientsupport/database-architecture/issues/142#issuecomment-829086037, or unsubscribe https://github.com/notifications/unsubscribe-auth/AO7ZVAXCLY442XMAJD4GJBLTLER6LANCNFSM43VDYCGA .
No, we have to be specific. Sometimes they will collect serum, sometimes plasma, and we need to use the name of the one used by the survey. I think the idea that we had, as shown in the 'threshold' file, is that you will have a micronutrient 'folate' and then a matrix 'plasma, serum, red blood cell, whole blood, urine, breastmilk'.
Le jeu. 29 avr. 2021 à 10:46, Roman @.***> a écrit :
- Can we treat "serum" and "plasma" as synonyms? For example, we store "serum folate" in our system, but the dataset lists "plasma folate"
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/micronutrientsupport/database-architecture/issues/142#issuecomment-829092931, or unsubscribe https://github.com/notifications/unsubscribe-auth/AO7ZVAWLHTFVMRSE7L5GIETTLETHJANCNFSM43VDYCGA .
- yes sometimes they live in the same household. We have variables that allow us to know that. Do you need that?
Yes, our data model assumes that we'll know what household a person belongs to. @spenny-liam is working on providing you with a template csv for importing data; that will have all the columns we want
Hi @fannymawbey I'm looking at defining the import formats for the biomarker survey results data. We already have the survey mentioned above (Malawi MNS 2016). Do you have another example survey we can use?
Ethiopia_National_MNS_report.pdf you can look at this one, as Ethiopia is also a priority country
Do you have the data for the Ethiopia Survey?
I was just going to ask the same thing. The PDF is obviously based on the raw data, but includes lots of charts and tables of aggregated data...
Yes we do. I can send via email if that's ok?
Le mar. 8 juin 2021 à 11:59, andy-bevan @.***> a écrit :
I was just going to ask the same thing. The PDF is obviously based on the raw data, but includes lots of charts and tables of aggregated data...
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/micronutrientsupport/database-architecture/issues/142#issuecomment-856664308, or unsubscribe https://github.com/notifications/unsubscribe-auth/AO7ZVARL5QV4UNU75ENI7GTTRXZXTANCNFSM43VDYCGA .
Yep that's fine.
This is the format we would need the data in as a CSV file - see comment below for latest |
(Just editing the comment to indicate the allowed values for education)
We would also need the metadata about the survey - see comment below
And details about the household see comment below
Thanks @fannymawbey I got the data csv. I may have a few questions...
@fannymawbey for the Malawi data we have a dictionary to explain and translate some of the values. Do we have something similar for the Ethiopia data? For example RESIDENCE values are either 1 or 2 in the data, but what does that translate as?
right, I'll send that over.
Le mer. 9 juin 2021 à 09:47, andy-bevan @.***> a écrit :
@fannymawbey https://github.com/fannymawbey for the Malawi data we have a dictionary to explain and translate some of the values. Do we have something similar for the Ethiopia data? For example RESIDENCE values are either 1 or 2 in the data, but what does that translate as?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/micronutrientsupport/database-architecture/issues/142#issuecomment-857510531, or unsubscribe https://github.com/notifications/unsubscribe-auth/AO7ZVAVN7PDWQOUTE3NKQBTTR4TCNANCNFSM43VDYCGA .
@rbroth there are a bunch of boolean columns on the household_member table about the person (is_pregnant, is_smoking, is_ill etc). Then in the biomarker_measurement table there is 'was_fasting'. Just wondering if that should be moved to the hh_member table as it is an attribute of the person, rather than the measurement...?
The original idea from those tables was that one person could have multiple measurements taken and they'd be reported seperately, so you could have your blood taken after fasting the day before, but you could then eat something and have a urine sample taken after lunch. But you're right, it might a good idea. @fannymawbey just to confirm, biomarker measurements data is always given in this "pre-aggregaed" state, where information about a person is collected at the same time that biomarker measurements are taken?
The CLUSTER field in the data is arriving as a alphanumeric code but the 'survey_cluster' column in the household_member table is numeric. I guess we need to change the data_type in the column, unless I am missing something? @rroth - what do you think - does this deserve an issue?
Following today's meeting, we have established that:
Great, thanks Andy for noting these.
Attached Malawi and Ethiopia files. I added the metadata for Malawi in the excel, just let me know if that looks OK. I checked in Notepad and it looks like everything that should be double quoted is but it's not super easy to see in Notepad so please just let me know if something doesn't look right. thank you! I'll send by email as it looks like I can attach csv here.
Latest format spec: (Note: all column headers in lowercase)
Column Header | Data Type | Values/Format | Description |
---|---|---|---|
person_id | text | Unique identifier for the person | |
sex | text | Whether the household member is male or female | |
education_level | text | The highest level of education achieved by this person. Can be primary | |
group_id | text | PREG, WRA,PSC,SAC,MEN | The group id of the person, usually depends on a combination of sex, age, pregnancy status |
age_in_months | integer | The age of the person at the time of the survey | |
is_breastfed | boolean | 0,1,null | Whether the child is being breastfed; "true" indicates that they are |
is_lactating | boolean | 0,1,null | Whether this woman is lactating; "true" indicates that she is |
is_pregnant | boolean | 0,1,null | Whether this woman is pregnant; "true" indicates that she is |
is_smoker | boolean | 0,1,null | Whether this person smokes tobacco |
had_illness | boolean | 0,1,null | Whether this person has been ill in the last two weeks |
had_malaria | boolean | 0,1,null | Whether this person has had malaria in the last two weeks |
had_diarrhoea | boolean | 0,1,null | Whether this person has had diarrhoea in the last two weeks |
survey_cluster | text | The surveying cluster used for selecting this individual | |
survey_strata | integer | The surveying strata used for selecting this individual | |
survey_weight | integer | The surveying weight used for selecting this individual | |
literacy | text | Examples: 'fully literate', 'Able to read whole sentence', 'Cannot read at all' | |
weight_in_kg | numeric | Weight of person in kilograms | |
height_in_cm | numeric | Height of person in centimetres | |
bmi | numeric | body mass index - can be submitted or calculated from weight and height , if present | |
interview_date | date | dd/mm/yyyy | Date when person was interviewed/sampled |
date_sampled | date | dd/mm/yyyy | The date when the sample was taken from the body |
sample_type | text | Whether this is a urine, blood, or breastmilk sample | |
haemoglobin | numeric | The amount of haemoglbin (also known as hemoglobin) in the blood, in grams per litre. | |
ferritin | numeric | The amount of (serum or plasma) Ferritn found in the blood, in micrograms per litre | |
stfr | numeric | The amount of (serum or plasma) Soluble Transferrin Receptors (sTfR) in the blood, in milligrams per litre | |
rbp | numeric | The amount of (serum or plasma) Retinol Binding Protein (RBP) in the blood, in micromoles per litre | |
retinol | numeric | The amount of (serum or plasma) Retinol in the blood, in micromoles per litre | |
rbc_folate | numeric | The amount of red blood cell Folate in the blood, in nanomoles per litre | |
ps_folate | numeric | The amount of serum or plasma Folate in the blood, in nanomoles per litre | |
vitamin_b12 | numeric | The amount of (serum or plasma) Vitamin B12 (also known as cobalamin) in the blood, in picomoles per litre | |
zinc | numeric | The amount of (serum or plasma) zinc, in micrograms per decilitre | |
crp | numeric | The amount of (serum or plasma) C-reactive protein (CRP) in the blood, in milligrams per litre. A value >=5 defines inflammation | |
agp | numeric | The amount of (serum or plasma) Alpha 1 acid GlycoProtein (AGP) in the blood, in grams per litre. A value >=1 defines inflammation | |
iodine | numeric | The amount of Urinary Iodine, in micrograms per litre. | |
time_of_day_sampled | text | am or pm | Whether the measurement was taken in the morning or afternoon (pre or post noon local time) |
was_fasting | boolean | 0,1,null | Whether the sample was taken while the patient was fasting |
Household Details | Note that household values must be identical per household id | ||
household_id | text | Value to uniquely identify the household | |
urbanity | text | urban,rural,null | Whether this household is located in an urban or rural area |
household_expenditure | numeric | How much money this household spends per month in local currency | |
wealth_quintile | integer | 1,2,3,4,5 | In what quintile of househol wealth this household is (5=richest, 1=poorest). If values are not identical for a given id then use the lowest value in all rows for that household id |
altitude_in_metres | numeric | The altitude of the household above sealevel in metres. Used to adjust hemoglobin levels when assessing deficiency. If values are not identical for a given id then use the average value per household id | |
latitude | numeric | The latitude of the location of the household, or the centroid of the cluster of households this household belongs to. If values are not identical for a given id then use the average value per household id | |
longitude | numeric | The longitude of the location of the household, or the centroid of the cluster of households this household belongs to. If values are not identical for a given id then use the average value per household id | |
region | text | Region in which the household is situated | |
Survey Details (only top row needs to be populated) | |||
survey_name | text | The name of the survey | |
publication_date | text | month year | The month and year or date when the survey was published |
surveying_date_start | text | month year | The month and year or date when surveying and interviewing started |
surveying_date_end | text | month year | The month and year or date when no further interviews took place |
geometry | text | The area covered by the survey; all households interviewed by the survey should fall into this area - geometry will be used to represent this area. | |
geonetwork_uuid | text | a UUID for Geonetwork. See https://geonetwork-opensource.org/ for further info | |
survey_weight_factor | integer | The factor survey weightings have been scaled by | |
notes | text | A generic text field for notes about a survey. Can include such as: Whether biomarker measurements were taken from blood plasma or blood serum | |
survey_type | text | biomarker' or 'food consumption' | Type of survey |
Source File Details (only top row needs to be populated) | |||
file_notes | text | Notes about file preparation, considerations, assumptions etc. might include notes about which data were excluded from the file, and why | |
created_by | text | Name of the person or organisation responsible for creating the file | |
created_date | text | Date when file created |
Make sure to also update the files in the repo
Hi @fannymawbey One of the new files has a column called 'had_fever'. Is that the same as 'had_illness' or is that a new parameter?
The format was updated in a recent MR on the gitlab project so closing this issue.
https://kwvmxgit.ad.nerc.ac.uk/bmgf-maps/data/db-test-data/-/merge_requests/45
We want to agree on a format for importing biomarker data into the system. It should be csv