micronutrientsupport / database-architecture

The Postgres database code for the MAPS tool
3 stars 0 forks source link

Food groups are shown twice #295

Closed rbroth closed 2 years ago

rbroth commented 2 years ago

In this instance, 'Vegetables, other...' versus 'vegetables, other....'. Porbably to do with capitalisation

https://preview.micronutrient.support/quick-maps/diet/baseline?country-id=MWI&mnd-id=Ca&measure=diet&age-gender-group-id=WR

GetAttachmentThumbnail A

andy-bevan commented 2 years ago

Does anyone know if these would count as duplicates?

Cereals Cereals and their products Cereals, other and products

@bgsandan @rbroth @LuciaSegovia

andy-bevan commented 2 years ago

Also:

Eggs and products Eggs and their products

Fruits and their products Fruits and vegetables Fruits, other and products

Vegetables and their products Vegetables, other and products

LuciaSegovia commented 2 years ago

@andy-bevan, are those food_groups? or food_names?

andy-bevan commented 2 years ago

These are food groups.

LuciaSegovia commented 2 years ago

These are food groups.

I think I know where the duplication is coming from. It is from an old version of a FCT that we generated to match the Household Survey data in Malawi (IHS4). I was used before we agreed on the composition matching hierarchy. It's this one in here I hope it helps :)

rbroth commented 2 years ago

FCT that we generated to match the Household Survey data in Malawi (IHS4)

Can we get rid of that one?

LuciaSegovia commented 2 years ago

@rbroth I think we should, it may cause some missing values, if the FCT cascade is not fully implemented. But I think it would be good to remove it :)

andy-bevan commented 2 years ago

The file mentioned above does not seem to be used directly by any of the loading processes. The food groups are loaded like:

  1. Hard coded values inserted when the food group table is created
    • these are depended on by impact commodities - if the food groups are missing an error occurs when the impact commodities are loaded.
      1. Food groups are loaded from the food composition tables during the python loading process. E.g. MAPS_MAFOODS_v1.5.csv. the script checks for existing values and uses those rather than adding duplicates but then the codes are not correct.

So if we update the ids in the CSVs we would remove the duplicates...?

@LuciaSegovia does this make sense?

andy-bevan commented 2 years ago

Example - Initial hardcoded values:

(0,0,'other') , (1,1, 'Cereals and their products') , (101,1,'Rice and rice-based products') , (102,1,'Maize and maize-based products') , (103,1,'Wheat and wheat-based products')

andy-bevan commented 2 years ago

Example values from CSV files:

2006 Cereals 2019 Cereals, other and products 2006 2093 Wheat and products 2006 2043 Maize and products (including white maize) 2006 2044 Maize germ oil and products 2006

andy-bevan commented 2 years ago

@bgsandan - is this still even an issue now that the initial duplicates have been removed?

LuciaSegovia commented 2 years ago

I am sorry @andy-bevan, but I am not sure if I am following what is going on. Is this an issue of misalignment between IMPACT model and food composition data food groups names?

andy-bevan commented 2 years ago

No worries - lets see if Andy can shed any light...?

I am sorry @andy-bevan, but I am not sure if I am following what is going on. Is this an issue of misalignment between IMPACT model and food composition data food groups names?

bgsandan commented 2 years ago

@andy-bevan - Sorry just catching up on this.  So when you say "is this still even an issue now that the initial duplicates have been removed?"  does that mean that the initial issue with multiple versions of essentially the same string with differences in capitalisation in the treemap (as above screenshot) no longer occurs?

Re: IMPACT and FCT food groups being different.  Not a deal breaker for closing this issue but we should look to consolidate these (i.e. get the impact groups to match to existing FCT groups).  That can be a separate/new issue though

andy-bevan commented 2 years ago

Yes to the first question - those obvious duplicate/typos are resolved. I'll close the issue.