Add option for user to provide/obtain full gene expression data frame

rdmorin commented 2 years ago

Currently the get_gene_expression function requires the user specify a set of gene IDs (ENSG or HGNC) and it subsets the tidy data frame based on that information. We should add functionality to this to allow the user to specify that they want to get the full matrix back. An empty gene list is probably not the right approach since it could give an unsuspecting user a massive data frame unintentionally. If we add an another parameter that is defaulted to FALSE all_genes=FALSE then check for that OR a gene list, we should be able to return the full data frame. To make this functionality helpful we also need the function to accept the same data frame as input and (when provided) use it directly and skip the step of loading it from disk. The purpose of this is to avoid users having to re-load that data from disk multiple times if they plan on running this function on different gene sets in an interactive session. Hence, the function will need a second new argument full_expression_df or something similarly named that is optional.

mattssca commented 2 years ago

get_gene_expression was updated to take two new parameters, all_genes and expression_data.

If all_genes is set to TRUE, the full expression df will be returned (no subsetting on genes specified either in hugo_symbols or ensemble_gene_ids). Error message for not calling either hugo_symbols or ensemble_gene_ids has been updated to not return an error message if all_genes are set to TRUE (and no genes specified).

Additional optional parameter (expression_data) can be used to use loaded expression data frame directly, preventing this data to be read into R again (from flat file or database).

Examples for the function have also been updated to reflect the above-described update.

These changes have been pushed in this commit

mattssca commented 2 years ago

This issue has been resolved in the commit described above.

morinlab / GAMBLR

Add option for user to provide/obtain full gene expression data frame #50