Open TylerSagendorf opened 2 years ago
Thank you for the suggestion. Currently, the package is just reformatting the original MSigDB for easier access. This might be outside the scope, but certainly worth considering.
To clarify, this is really an aesthetic change to make the name easier to read, right? For example, GOBP_5_PHOSPHORIBOSE_1_DIPHOSPHATE_METABOLIC_PROCESS
becomes 5-phosphoribose 1-diphosphate metabolic process
and GOBP_ACTIVATION_OF_CYSTEINE_TYPE_ENDOPEPTIDASE_ACTIVITY_INVOLVED_IN_APOPTOTIC_PROCESS_BY_CYTOCHROME_C
becomes activation of cysteine-type endopeptidase activity involved in apoptotic process by cytochrome c
.
To clarify, this is really an aesthetic change to make the name easier to read, right? For example,
GOBP_5_PHOSPHORIBOSE_1_DIPHOSPHATE_METABOLIC_PROCESS
becomes5-phosphoribose 1-diphosphate metabolic process
andGOBP_ACTIVATION_OF_CYSTEINE_TYPE_ENDOPEPTIDASE_ACTIVITY_INVOLVED_IN_APOPTOTIC_PROCESS_BY_CYTOCHROME_C
becomesactivation of cysteine-type endopeptidase activity involved in apoptotic process by cytochrome c
.
Yeah that's really all it is. Another solution would be to replace the underscores with spaces and change all text to lowercase, but that would remove intentional capitalization (such as with "mRNA") and characters that were replaced by underscores (like the dashes in your examples).
Yes, the original non-alphanumeric characters and capitalization are probably the most valuable aspect, and that can't be automatically fixed.
The entries in the
gs_description
column for GO terms are rather long and not ideal for use as human-readable identifiers when plotting ORA or GSEA results. Would it be possible to add ags_brief_description
column that uses the names from the appropriate GO database release? I have been getting the data using the code below and then left-joining it to ORA and GSEA results tables made with fgsea. For other databases, I just use the entries ings_description
.