Open tcatapano opened 4 years ago
The ranks also need normalization as capitalization of the ranks are creating two or more entries (e.g. Family, family; Sub-family, Sub-Family, sub-family; etc).
This might be a related issue, so I'm just reporting it here.
We might want to spend at some point time to clean this up in the treatments, that means Guido would have to run a script over the entire treatment corpus. This would also include other elements such as taxonomic status, etc.
At the same time we should think of whether we could normalize these terms at the moment we tagg them in the GGI process.
not sure at what stage this normalization would be best performed. In data or later?