micronutrientsupport / database-architecture

The Postgres database code for the MAPS tool
3 stars 0 forks source link

Moisture content adjustment when imputing values from on FCT to another FCT #209

Open LuciaSegovia opened 3 years ago

LuciaSegovia commented 3 years ago

The general practice when imputing one nutrient value from one FCT to another is adjust the nutrient content to the moisture value in the "original" food composition data.

For example, we are using maize, grain from MAFOODS but there is no value for selenium in MAFOODS. The tool will pick the selenium value from the next closest FCT, in this case it will be KENFCT. We can see that the moisture content for maize, grain in both FCTs are different: the moisture content of maize, grain in MAFOODS is 10.9g while in KENFCT is 13.6g. Therefore, we need to adjust the selenium value to the original water content. We use very simple formula to rescalate:

water_adjustment <- function(y) {

  x <- y * (100-WATER.x)/(100-WATER.y)

  x

}

Where x is the nutrient value that we would like to know (in our example it is the rescalated selenium content in maize, grain), y is the nutrient value that is imputed (in our example it is the selenium value in KENFCT), WATER.x is the water/ moisture content of the food that we want to use (in our example it is the water content of maize, grain in MAFOODS), and WATER.y is the water content of the food used to impute the missing value (in our example it is the water content of maize, grain in KENFCT).

@bgsandan, @spenny-liam, @Gare94, @LouiseAnder This is one of the issues that we discussed on Friday, after the wonderful demo from @spenny-liam :)

Can we integrate that kind of function in the food compo hierarchy?

LouiseAnder commented 3 years ago

thanks @LuciaSegovia this makes perfect sense, and good to follow 'best practice'. Is there a reference you can cite for this methodology?

Would there be situations where there is no moisture content published? If so, what is preferred an 'average' moisture content, or to assume it is the same as in the reported value (presumably from the same sample analysis) which is being used?

LuciaSegovia commented 3 years ago

These are the references we can use :)

(1) Gibson, R. S., & Ferguson, E. L. (2008). An interactive 24-hour recall for assessing the adequacy of iron and zinc intakes in developing countries. (2) Stadlmayr, B., Wijesinha-Bettoni, R., Haytowitz, D., Rittenschober, D., Cunningham, J., Sobolewski, R., Eisenwagen, S., Baines, J., Probst, Y., Fitt, E., & Charrondiere, U. R. (2012). FAO/INFOODS Guidelines for Food Matching.

Would there be situations where there is no moisture content published?

That's a really good point! As we are currently facing that issue with the regional fct. I guess the best solution will be to go back to the original data and try to get a moisture value, but I am not sure how realistic is. So, I guess, being pragmatic, I will go for the average water value. In that way, we can "adjust" to certain extent to the water content of that particular item.

LouiseAnder commented 3 years ago

@LuciaSegovia Pragmatic and very easy to trace sounds very sensible. I suppose question to @bgsandan and/or @spenny-liam is probably whether that is going to be calculated on demand or whether it needs storing in a table?

LuciaSegovia commented 3 years ago

Hi @LouiseAnder and @spenny-liam

A summary of moisture content adjustent discussion:

In order to get the best estimated micronutrient values, we need to adjust micronutrient content to the water content of the orginal item.

For example, we are estimating CA in Malawi and ideally we are matching food items to Malawi FCT. Unfortunaletly, in Malawi, they do not like to measure CA in cabbages. So, we need to use cabbage CA concetration from Kenya FCT. Because, Malawi and Kenya cabbages are different, they have different water content and it would influence CA concentration, we need to adjust the CA value from Kenya to Malawi FCT.

Some ideas that we proposed to improve the currently amazing food matching workflow to account for that water content adjustment:

Following the CA in Malawi example:

1) Starting by getting CA concentration of all food items in Malawi FCT. Let's say we have 100 matches out of 150 food items. Then, we only need to adjust for those other 50 food items. For those, 50 food items, first, we will retrieve water content and then using that water content we will re-calculate CA.

2) Starting by getting the water content of food. We will find the best/ closet geographic location for all food items in Malawi (that will be the "reference" item). Then, let's say that we have 120 food items with CA values and water content values coming from the same FCT (perfect!). That will leave us with 30 food items with NA for CA content. For those items, we will find the next best FCT and take CA concentration and water content to re-calculate CA content.

I see the option two flow as:

1) Run the FCT prioritazion function --> Get a list 1- Malawi 2-Kenya 3-West-Africa

2) Run water content matching

-Maize - Malawi -Beef - Kenya -Cabbage - Malawi

3) Run CA matching

-Maize - 30mg (Malawi FCT) -Beef - 50mg (Kenya FCT) -Cabbage - NA (Malawi FCT)

4) Run matching prioritization:

if CA == NA, CA in list_FCT+1 * (100-list_FCT)/(100-list_FCT+1)

That's more or less how I do it when I do food matching, the main difference is that the priority list of FCT and matching is manually, one FCT at a time. But, I think I could work on a loop based on a given priority list. Let me know if anythin is not clear or it doesn't make any sense! Happy to discuss further!