jokergoo / EnrichedHeatmap

make enriched heatmap which visualizes the enrichment of genomic signals to specific target regions.
http://jokergoo.github.io/EnrichedHeatmap/
Other
186 stars 25 forks source link

clustering for chromatins state #25

Closed crazyhottommy closed 6 years ago

crazyhottommy commented 6 years ago

Hi Zuguang,

Just found that now it supports for chromatin state data https://github.com/jokergoo/EnrichedHeatmap/issues/24 . for clustering, what clustering method and distance is used?

or could you please share the code for the figure in the README.md?

Thanks for this great package and development.

Best, Ming

jokergoo commented 6 years ago

You can find the code below and I also put a small test dataset in the package.

In the heatmap, the chromatin states are internally converted to numbers (e.g. active TSS as 1, Transcript as 2, ...) and they are mapped to discrete colors afterwards. Since the matrix itself is numeric, you can apply any normal clustering methods and distance methods to it (I think Euclidean distance is more proper here).

You need to be careful of the distance of chromatin states, e.g. active TSS is closer to Transcript states than repressive states. The order is controlled by the level order of the states column (e.g. in following code, levels(states$states_simplified)).

I haven't written an vignette for this new functionality, but I will do it soon.

install_github("jokergoo/EnrichedHeatmap")

library(EnrichedHeatmap)
library(GenomicFeatures)

load(system.file("extdata", "chr21_chromatin_states.RData", package = "EnrichedHeatmap"))
load(system.file("extdata", "chr21_test_data.RData", package = "EnrichedHeatmap"))

state_col = c("TssActive" = "#FF0000", "Transcript" = "#008000", "Enhancer" = "#C2E105",
        "Heterochromatin" = "#8A91D0", "TssBiv" = "#CD5C5C",
        "Repressive" = "#808080", "Quies" = "#000000")

tss = promoters(genes, upstream = 0, downstream = 1)
mat = normalizeToMatrix(states, tss, value_column = "states_simplified")
EnrichedHeatmap(mat, col = state_col, cluster_rows = TRUE, column_title = "chromatin states")

mat = normalizeToMatrix(states, genes, value_column = "states_simplified")
EnrichedHeatmap(mat, col = state_col, cluster_rows = TRUE, column_title = "chromatin states")
crazyhottommy commented 6 years ago

Thanks for your reply and code. How do you choose window size? default it is extend/50 = 100 bp in your example. If one has a bigger window size say 1000 bp, with extend to 50kb, there is possibility that multiple state calls will fall in to the 1000 bp window. In this case, how normalizeToMatrix work for mean_mode?

Tommy

jokergoo commented 6 years ago

Yes it is possible. But since chromatin states are always mutually exclusive, if multiple states fall in one window, I just pick the state with highest overlap to this window. Since the signsls are discrete, the setting of mean_mode is ignored.

Get Outlook for iOShttps://aka.ms/o0ukef

On Mon, Mar 5, 2018 at 6:22 PM +0100, "Ming Tang" notifications@github.com<mailto:notifications@github.com> wrote:

Thanks for your reply and code. How do you choose window size? default it is extend/50 = 100 bp in your example. If one has a bigger window size say 1000 bp, with extend to 50kb, there is possibility that multiple state calls with fall in to the 1000 bp window. In this case, how normalizeToMatrix work for mean_mode?

Tommy

- You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/jokergoo/EnrichedHeatmap/issues/25#issuecomment-370494485, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AAbawoB54ylU2v_5Lgi1VBp4SJynSryzks5tbXPhgaJpZM4SbhzH.

crazyhottommy commented 6 years ago

Thanks for the answer, makes sense to me.