micronutrientsupport / fct

Cleaning and standardisation of food composition tables
1 stars 2 forks source link

Tagging and NA's #4

Open LuciaSegovia opened 4 years ago

LuciaSegovia commented 4 years ago

Hi @rbroth

The issue with tagging is: some FCTs include values that are low quality, normally those items are marked with a range of special characters (best case scenario) like bracket, parenthesis , asterisk, etc. In other cases, they use italics, bold or colours...

I would like to have a way to account for that, so we can choose to use or not that values. There are several problems:

1) For those values marked with special characters, I can fix the issue (I hope) by creating a column, as you suggested before, to account for those values. If you have other suggestions, I'm happy to hear them.

2) For those values that are marked with font related modifications, I have no clue how to identify them because when I open the dataset in R, all fonts and colours are standardized removing all colour and other things. Do you have a solution for this?

Thanks again!

rbroth commented 4 years ago

For special characters, I can write some SQL code to extract them; I'm not worried about those. Or you can use

Font formatting is different, and a big problem (hence why you should avoid using formatting in excel). As far as i can see we can:

LuciaSegovia commented 4 years ago

Hi Roman!

Thank you very much for your suggestions. I will try with point 3, and use point 4 as plan B.

rbroth commented 4 years ago

If you can't find a promising package by the end of today, let me know and we can have a pair programming session over zoom tomorrow afternoon.

LuciaSegovia commented 4 years ago

I found a potential package :) I hope it's useful!