micronutrientsupport / fct

Cleaning and standardisation of food composition tables
1 stars 2 forks source link

[Function] - Adding new food entries into MAPS food dictionary #5

Open LuciaSegovia opened 1 year ago

LuciaSegovia commented 1 year ago

Hi @TomCodd

I would like to generate a function for adding new food entries into the dictionary. You can see an example of the potential fuction here.

The idea is that the new food entries would be nested into the sub-categories (ID_0, ID_1, ID_2), and ID_3 should be unique, but generated from ID_2.

As you can see, what I am currently doing is to provide manually the category where it should be nested (ID_2) and copying a new row, where then the ID_3 is generated as continuous numeric sequence (01-99). Then, I manually assigning a name (FoodName_3), the scientific name (scientific_name) and the FoodEx2 code (FE2_3).

In addition, the function should allow for 3 optional variables to be filled manually to be added: Description1, Desc1.ref. Description2, Desc1.ref2. See here.

Let me know if that makes any sense and if you would like some more clarification :)

THANKS!

TomCodd commented 1 year ago

Hi @LuciaSegovia

So essentially you'd start with nothing (no dataframe of items to add or anything), assign an ID_2 manually, from which an ID_0, ID_1, ID_3 will be automatically assigned or generated, and then manually put in the FoodName_3, scientific_name, and FE2_3 code?

Would you like to do this one at a time (i.e. I can write a normal function which takes some inputs, generates some fields from them, and outputs something to add to the dictionary, or a new dictionary based on the old dictionary and the newly specified entry), or would you like to be able to do these as a batch (which I would have to experiment with, but would essentially involve an editable RShiny table that you can type entries into, (so a column for ID_2, FoodName_3 etc,) which then generates the generated fields on output)?

:)

LuciaSegovia commented 1 year ago

Hi @Tom,

The starting point should be the "MAPS_Dictionary_v2.6.csv" of the food dictionary, and the fuction is to add new entries. I think for now, I would like a function that allows for one at a time addition. As, they need "careful" thought before being added, and they should be placed in the correct nested category.

Basically, I would like to avoid copy and pasting that chunck of code every time I wan to add a new item, and be able to just call a function to add the items. 😅

Thanks! Lucia

TomCodd commented 1 year ago

Hi @LuciaSegovia

OK, no problem! I'll get started on that :)

Many thanks

Tom

TomCodd commented 1 year ago

Hi @LuciaSegovia

Just wondering - when you run this, would it normally be from a downloaded csv you've imported into R before the script, or would you like me to add the functionality to read the dictionary from its github location? :)

TomCodd commented 1 year ago

Hi @LuciaSegovia

I've got a fucntion which automatically generates a unique ID_3 now, however I'd appreciate a bit of guidance on the generation of ID_0 and ID_1?

We chatted about this in December last year, and unless things have changed ID_0 is the broad group, and comes from your supervisors work, and ID_1 comes from the Food Balance sheets. Were you looking for these to be automatically found and by the function? If so, do you have any lookup tables of how ID_2's might relate to ID_0's and ID_1's? :)

Many thanks!

Tom

LuciaSegovia commented 1 year ago

Hi @Tom,

So the fuction should work with the food dictionary loaded. So, the function doesn't need to do it. And, because de categories are already in the food dictionary, what I'm currently doing is just duplicating a row in the food dictionary, and then in that duplicaded row, which will have the correct ID_1:ID_2, I add the new ID_3. So it would be something like:

I hope it makes more sense now 😬

Add - 1802 - sugar cane, juice, raw

AddingNewEntry <- function(df = dictionary.df, id2 = "", desc_new ="", fex2_new,= "". scien_new ="") {

Manual inputs:

id2 <- "1802" #This is the ID_2 that will identify the row that needs to be duplicated desc_new <- "sugar cane, juice, raw" #This is the name of the new food entry (FoodName_3) fex2_new <- NA # This is the FoodEx2 code scien_new <- NA # This is the scientific name

Auto inputs:

id3 <- tail(sort(dictionary.df$ID_3[dictionary.df$ID_2 == id2]), n=1) #Identify if there is any ID_3 in for that ID_2 category id3_new <-ifelse(is.na(id3)|id3 == "", paste0(id2, ".01"), # Generate the new ID_3 base on the previous step str_replace(id3, "[[:alnum:]]{1,3}$", formatC(seq(from = str_extract(id3, "[[:digit:]]{1,3}$"), 99), width=2, flag=0)[2]))

n1 <- dim(dictionary.df)[1]+1 #Add new row into the dataset (dictionary.df)

n2 <- ifelse(is.na(id3)|id3 == "", which(dictionary.df$ID_2 %in% id2), #Identify the row to be duplicated which(dictionary.df$ID_3 %in% id3))

New entry - generation:

dictionary.df[n1,] <- dictionary.df[n2,] #Generate the new "duplicated" item

New entry - population:

dictionary.df[n1,7] <- id3_new #Adding new ID_3 dictionary.df[n1,8] <- fex2_new #Adding new FoodEx2 dictionary.df[n1,9] <- desc_new #Adding new food descriptio (FoodName_3) dictionary.df[n1,13] <- scien_new #Adding new scientific name

return(dictionary.df)

}

TomCodd commented 1 year ago

Hi @LuciaSegovia

Ah, ok - Thats fair enough, I was wondering whether you wanted the capacity to add new ID_2's which haven't appeared in the food dictionary before, which would require the lookups, I was thinking :) If the function only needs to work with ID_2's that are already in the dictionary that makes things easier!

Many thanks

Tom

LuciaSegovia commented 1 year ago

Yeah, sorry, I wasn't very clear about it 😅 I added a data folder in our SharePoint w/ the data on this repo :)

TomCodd commented 1 year ago

Hi @LuciaSegovia

I believe these two functions are what you're looking for: Add_To_Dictionary.R

Please let me know if they're living up to what you had hoped for! I'm more than happy to make any changes that are needed :)

TomCodd commented 1 year ago

Yeah, sorry, I wasn't very clear about it 😅 I added a data folder in our SharePoint w/ the data on this repo :)

Just as much my fault, you even mentioned and showed the current method is to copy an existing row and modify it, I was just a bit set in my thinking that its best to create things from scratch 😅

LuciaSegovia commented 1 year ago

Glad that we syntonised back 😂

TomCodd commented 1 year ago

Glad that we syntonised back 😂

Haha, how do you mean? Not seen that word used before 😂 do you mean got back to the same understanding? :)

LuciaSegovia commented 1 year ago

HAHAHAHA, sorry I tried to "spanglish" to hard. Yes, I meant that we were in the same page now 😄

TomCodd commented 1 year ago

HAHAHAHA, sorry I tried to "spanglish" to hard. Yes, I meant that we were in the same page now 😄

You did well, its a word that means what you wanted it to, I've just never heard of it before hahahhaha 😄