noriakis / ggkegg

Analyzing and visualizing KEGG information using the grammar of graphics
https://noriakis.github.io/software/ggkegg
MIT License
210 stars 15 forks source link

highlight_entities function #1

Closed Ramirj closed 10 months ago

Ramirj commented 11 months ago

Hello noriakis,

This package is outstanding, it has been very useful to my current research projects. Right now I am trying to create a diagram for the glycolysis pathway for Oryza Sativa. Everything is going smoothly except for one thing, when I run the function highlight_entities I get the error message "could not find function "highlight_entities" in R. I'm pretty sure I downloaded all of the necessary packages so I'm not sure what the problem is. Any help with this would be much appreciated. Thank you!

noriakis commented 11 months ago

Hello @Ramirj,

Thank you very much for using the package. I apologize for the inconvenience. For highlight_entities(), it may not be available in the Bioconductor install. Could you try devtools::install_github("noriakis/ggkegg", force=TRUE) approach for the installation?

Ramirj commented 11 months ago

@noriakis I'm still getting the same error message. Could my computer be the issue? I'm using a mac.

noriakis commented 11 months ago

@Ramirj Thank you for the reply. I have tested now in my macOS using R 4.3.1, and it seems that the function is working by devtools installation. Could you try removing the package and reinstalling it?

remove.packages("ggkegg")
devtools::install_github("noriakis/ggkegg")
ggkegg::highlight_entities()
Ramirj commented 11 months ago

That worked, thank you so much!

Ramirj commented 11 months ago

One more thing, I'm trying to create a custom graph using ggraph but it comes out looking very messy. How can I get my graph (top) to look like the graph that you created in your example (bottom)? I also provided a picture of my code below. Thank you for all of your help!

image

image

Screenshot 2023-10-23 at 1 20 30 PM
noriakis commented 11 months ago

@Ramirj I'm glad it worked.

For your second problem, this is a very important use case, but there is currently no easy workaround. The osa00010 pathway has EC number labels in the native map, but these are not provided in the KGML files. Instead, the full name is provided in the gene list of osa, which is long and sometimes looks messy.

A workaround would be to fetch the RN to EC link and map it to the label, which simplifies the visual. Depending on what you want to inspect, some node types may be omitted.

The example below maps the EC number label in osa genes, showing only the gene and compound type nodes.

## Fetch and cache RN to EC map
library(ggkegg)
library(BiocFileCache)

url <- paste0("https://rest.kegg.jp/link/reaction/ec")
bfc <- BiocFileCache()
path <- bfcrpath(bfc, url)
convert <- data.frame(data.table::fread(path, header = FALSE, sep = "\t"))
rntoec <- convert$V1 |>
    strsplit(":") |> 
    vapply("[",2,FUN.VALUE="a") |>
    setNames(convert$V2)

## Map the EC number to gene nodes
g <- pathway("osa00010") |> 
    mutate(ec=rntoec[reaction])

## Visualize
gg <- g |> filter(type %in% c("compound","gene")) |>
    ggraph(layout="manual", x=x, y=y)+
    geom_edge_link(aes(color=subtype_name))+
    geom_node_point(color="lightblue", aes(filter=type=="compound"))+
    geom_node_rect(fill="lightpink", aes(filter=type=="gene"))+
    geom_node_shadowtext(aes(label=ec, filter=type=="gene"), color="black",
                         bg.colour="white", size=2)+
    geom_node_text(aes(label=name, filter=type=="compound"),
                   color="grey50", size=2, repel=TRUE)+
    theme_void()
gg
Ramirj commented 11 months ago

Thank you so much, you're a lifesaver! Is there a way to keep the log fold changes on the graph?

noriakis commented 11 months ago

@Ramirj Thank you for the reply. Yes, you can keep them by making a new column. The example below maps the continuous values to nodes in the column num.

## Visualize
lfcs <- sample(seq(-3,3.0,0.1), length(V(g)), replace=TRUE)
gg <- g |> mutate(num=lfcs) |>
    filter(type %in% c("compound","gene")) |>
    ggraph(layout="manual", x=x, y=y)+
    geom_edge_link(aes(color=subtype_name))+
    geom_node_point(color="lightblue", aes(filter=type=="compound"))+
    geom_node_rect(aes(fill=num, filter=type=="gene"))+
    geom_node_shadowtext(aes(label=ec, filter=type=="gene"), color="black",
                         bg.colour="white", size=2)+
    geom_node_text(aes(label=name, filter=type=="compound"),
                   color="grey50", size=2, repel=TRUE)+

    theme_void()
Ramirj commented 11 months ago

Thanks so much!

Ramirj commented 10 months ago

Hi @noriakis,

Is there a way to map KO numbers to pathway maps instead of reaction numbers? I tried modifying the code you sent before to achieve this but I received this error message:

Error in mutate(): ℹ In argument: ec = kotoec[ko]. Caused by error: ! object 'kotoec' not found

If you could help me find the cause of this issue I would appreciate it, thank you!

Here's the code that I used:

Make Oryza Sativa Glycolysis pathway diagram with LFCs genes present in lab data

Create a map of TCA pathway for Oryza Sativa + map EC number labels

library(ggkegg) library(BiocFileCache)

Fetch and cache RN to EC map

url <- paste0("https://rest.kegg.jp/link/ko/ec") bfc <- BiocFileCache() path <- bfcrpath(bfc, url) convert <- data.frame(data.table::fread(path, header = FALSE, sep = "\t")) rntoec <- convert$V1 |> strsplit(":") |> vapply("[", 2, FUN.VALUE = "a") |> setNames(convert$V2)

Map the EC number to gene nodes

g <- pathway("dosa00010") |> mutate(ec = kotoec[ko])

Define a vector of node names to highlight

nodes_to_highlight <- c("ko:K15644", "ko:K15633")

Add LFCs

Create a vector containing log fold changes

LFC_vector <- c(1.58, 3.52)

Add log fold changes to the graph

g <- g %>% mutate(LFC = ifelse(reaction %in% nodes_to_highlight, LFC_vector, NA))

Add color gradient to visualize LFCs

gg <- g |> filter(type %in% c("compound", "gene")) |> ggraph(layout = "manual", x = x, y = y) + geom_edge_link(aes(color = subtype_name)) + geom_node_rect(aes(fill = LFC), size = 5) + geom_node_shadowtext(aes(label = ec, filter = type == "gene"), color = "black", bg.colour = "white", size = 2) + geom_node_text(aes(label = name, filter = type == "compound"), color = "grey50", size = 2, repel = TRUE) + scale_fill_viridis_c(option = "viridis", direction = 1, begin = 0.6, end = 1.0) + # Customize color scale theme_void()

gg

noriakis commented 10 months ago
Error in mutate():
ℹ In argument: ec = kotoec[ko].
Caused by error:
! object 'kotoec' not found

The error you received is caused as you did not define kotoec object. If you are trying to convert KO ID to EC number based on the named vector (assuming from your code), you should make named vector of EC number. As I mentioned previously, this issue section is not for discussing how to convert or map the IDs.