Open LuciaSegovia opened 4 years ago
For special characters, I can write some SQL code to extract them; I'm not worried about those. Or you can use
Font formatting is different, and a big problem (hence why you should avoid using formatting in excel). As far as i can see we can:
Contact the original authors and see if they have the data in a different format. Would take time and no certainty of success.
Re-code the data manually. Time+work intensive, though we might be able to speed this up e.g. by sorting the data on another column and doing multiple rows at a time.
Search for an R package that can read excel formatting. I don't have experience in R, so you're a better judge of how feasible this is.
Convert the formatting into a new column inside excel, using find/replace, VBA, and such. I think may be the most feasible option; there are tutorials on how to do this inside excel, though we may have to write some VBA code.
Hi Roman!
Thank you very much for your suggestions. I will try with point 3, and use point 4 as plan B.
If you can't find a promising package by the end of today, let me know and we can have a pair programming session over zoom tomorrow afternoon.
I found a potential package :) I hope it's useful!
Hi @rbroth
The issue with tagging is: some FCTs include values that are low quality, normally those items are marked with a range of special characters (best case scenario) like bracket, parenthesis , asterisk, etc. In other cases, they use italics, bold or colours...
I would like to have a way to account for that, so we can choose to use or not that values. There are several problems:
1) For those values marked with special characters, I can fix the issue (I hope) by creating a column, as you suggested before, to account for those values. If you have other suggestions, I'm happy to hear them.
2) For those values that are marked with font related modifications, I have no clue how to identify them because when I open the dataset in R, all fonts and colours are standardized removing all colour and other things. Do you have a solution for this?
Thanks again!