neurogenomics / orthogene

🧬 o r t h o g e n e 🧬✨✨✨✨✨✨✨ Interspecies gene mapping✨✨✨✨✨ 🦠 πŸ” 🌱 πŸ” 🌳 πŸ” 🍎 πŸ” 🍊 πŸ” πŸͺ± πŸ” πŸͺ° πŸ” 🐟 πŸ” 🦎 πŸ” πŸ“ πŸ” πŸ¦‡ πŸ” πŸ„ πŸ” πŸ– πŸ” 🐐 πŸ” 🐎 πŸ” 🐈 πŸ” πŸ• πŸ” 🐁 πŸ” πŸ’ πŸ” 🦧 πŸ” 🦍 πŸ” πŸƒβ€β™€οΈ
https://doi.org/doi:10.18129/B9.bioc.orthogene
40 stars 4 forks source link

Identifier mapping to Ensembl identifiers #15

Open mkutmon opened 2 years ago

mkutmon commented 2 years ago

I tried to figure out how I can change the output to Ensembl identifiers instead of gene symbols. I tried adding the argument "numeric_ns="ENSG" but that didn't help. Do you have a hint on how I can achieve that?

bschilder commented 2 years ago

Hi @mkutmon, which function are you trying to use? Could you provide a quick reproducible example?

mkutmon commented 2 years ago

I have a list of human Ensembl identifiers and would like to get the mouse Ensembl identifiers back.

mapped.data <- orthogene::convert_orthologs(gene_df = human.ids,
                                        gene_input = "GeneID", 
                                        gene_output = "columns", 
                                        input_species = "human",
                                        output_species = "mouse",
                                        non121_strategy = "kbs",
                                        method = method)

Currently, this method results in a new column "ortholog_gene" which is the mouse gene name. I would like to have the Ensembl identifier for mouse (ENSMUSG...). Is that possible?

bschilder commented 2 years ago

I can try and infer your use case from the above code snippet, but I'm afraid the above is not a reproducible example (i.e. i can copy and paste the code into R and it will reproduce the problem). You can read about how to make a reprex here. For future bug reports I've added an Issues template to guide users. I've attached the template for you to use here as well. bugs_template.txt

bschilder commented 2 years ago

Here's an example of a reprex that i think approximates your use case:

human_genes  <- orthogene::all_genes(species = "human")
method <- "gprofiler2"

mapped.data <- orthogene::convert_orthologs(gene_df = human_genes$target[1:10], 
                                            standardise_genes = TRUE,
                                            gene_output = "columns", 
                                            input_species = "human",
                                            output_species = "mouse",
                                            non121_strategy = "kbs",
                                            method = method)

mouse_genes <- orthogene::map_genes(genes = mapped.data$ortholog_gene, 
                                    species = "mouse")

Screenshot 2022-05-24 at 12 51 21

Note standardise_genes = TRUE. This means that your input ensembl IDs will be translated to human gene symbols first. These can then be translated to mouse gene symbols. From the docs: Screenshot 2022-05-24 at 12 50 23

That said, I think a nice feature would be to do this all in one step, and return convert_orthologs as whatever gene format is requested (not just gene symbols). I'll look into adding this feature to the next release of orthogene.