perslab / CELLEX

CELLEX (CELL-type EXpression-specificity)
GNU General Public License v3.0
36 stars 9 forks source link

gene mapping function: from symbol to ensembl #26

Closed pascaltimshel closed 4 years ago

pascaltimshel commented 4 years ago

Problem Many scRNA-seq data sets come with human gene symbols. We should make it easier for users to map to Ensembl IDs, since this is used in CELLECT.

Solution Write function to map from human ens to symbol.

Additionally, consider updating mapping function names for more consistency: (NEW: ens_human_symbol_to_ens --> human_symbol_to_human_ens) ens_human_to_symbol --> human_ens_to_human_symbol ens_mouse_to_ens_human --> mouse_ens_to_human_ens mgi_mouse_to_ens_mouse --> mouse_symbol_to_mouse_ens

The appropriate file to make the mapping is attached (which allows for mapping genes with 'version numbers' to the appropriate Ensembl ID): GRCh38.ens_v90.gene_name_version2ensembl.txt.gz

tstannius commented 4 years ago

Hi P,

I've got a prototype ready, but while we're at it, I would like to address #20 - could you make a GRCh38 map for mouse_ens_to_human_ens?

Also, for future reference, could you describe how the mapping files are made?

pascaltimshel commented 4 years ago

Done: https://github.com/perslab/CELLEX/commit/80a34d205f384d891891bd4ec985ad631aa8bd0d .

tstannius commented 4 years ago

Mapping utils updated and added human symbol to human ensemble in https://github.com/perslab/CELLEX/commit/4ccd65af6823f775a31a6c3dbaf52beb08120211 (n.b. there's a typo in the commit msg, it says hs ens to hs sym, when it should be hs sym to hs ens)