jokergoo / EnrichedHeatmap

make enriched heatmap which visualizes the enrichment of genomic signals to specific target regions.
http://jokergoo.github.io/EnrichedHeatmap/
Other
187 stars 25 forks source link

Error with normalizeToMatrix() when using "smooth = TRUE" #3

Closed ycl6 closed 8 years ago

ycl6 commented 8 years ago

Trying the normalizeToMatrix() example using data provided with the package

> meth
GRanges object with 100000 ranges and 1 metadata column:
           seqnames               ranges strand   |              meth
              <Rle>            <IRanges>  <Rle>   |         <numeric>
       [1]    chr21   [9432427, 9432427]      *   | 0.267104203931852
       [2]    chr21   [9432428, 9432428]      *   | 0.267106771955287
       [3]    chr21   [9432964, 9432964]      *   | 0.272709911227985
       [4]    chr21   [9432965, 9432965]      *   |   0.2727345344132
       [5]    chr21   [9433315, 9433315]      *   | 0.285114797969136
       ...      ...                  ...    ... ...               ...
   [99996]    chr21 [48114354, 48114354]      *   | 0.621073675200549
   [99997]    chr21 [48114513, 48114513]      *   |  0.62345197113453
   [99998]    chr21 [48114514, 48114514]      *   | 0.623474995478024
   [99999]    chr21 [48114920, 48114920]      *   | 0.644657359145078
  [100000]    chr21 [48115085, 48115085]      *   | 0.699363776991418
  -------
  seqinfo: 1 sequence from an unspecified genome; no seqlengths
> tss
GRanges object with 24952 ranges and 3 metadata columns:
        seqnames             ranges strand   |                 name     score      symbol
           <Rle>          <IRanges>  <Rle>   |          <character> <integer> <character>
  15155     chr1 [3661579, 3661579]      -   | ENSMUSG00000051951.5         0        Xkr4
  23469     chr1 [3456668, 3456668]      +   | ENSMUSG00000089699.1         0      Gm1992
   5254     chr1 [4350473, 4350473]      -   | ENSMUSG00000025900.4         0         Rp1
   5255     chr1 [4486494, 4486494]      -   | ENSMUSG00000025902.7         0       Sox17
   9486     chr1 [4775820, 4775820]      -   | ENSMUSG00000033845.7         0      Mrpl15
    ...      ...                ...    ... ...                  ...       ...         ...
  15352     chrY [2086590, 2086590]      +   | ENSMUSG00000052831.7         0     Rbmy1a1
  18614     chrY [2118049, 2118049]      +   | ENSMUSG00000069031.5         0     Gm10256
  19278     chrY [2156899, 2156899]      +   | ENSMUSG00000071960.4         0     Gm10352
  24665     chrY [2390390, 2390390]      +   | ENSMUSG00000091987.1         0      Gm3376
  23940     chrY [2550262, 2550262]      +   | ENSMUSG00000090600.1         0      Gm3395
  -------
  seqinfo: 22 sequences from an unspecified genome; no seqlengths
> normalizeToMatrix(meth, tss, value_column = "meth", mean_mode = "absolute", extend = 5000, w = 50, empty_value = NA, smooth = TRUE)

Warning message:
In .Seqinfo.mergexy(x, y) :
  The 2 combined objects have no sequence levels in common. (Use
  suppressWarnings() to suppress this warning.)
Error in t(apply(mat, 1, function(x) { :
  error in evaluating the argument 'x' in selecting a method for function 't': Error in locfit(x[l] ~ lp(seq_along(x)[l], nn = 0.2)) :
  fewer than one row in the data
> sessionInfo()
R version 3.2.2 (2015-08-14)
Platform: x86_64-pc-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
 [1] grid      stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
 [1] circlize_0.3.3             EnrichedHeatmap_1.0.0      locfit_1.5-9.1             ComplexHeatmap_1.6.0
 [5] GenomicAlignments_1.6.1    Rsamtools_1.22.0           Biostrings_2.38.2          XVector_0.10.0
 [9] SummarizedExperiment_1.0.1 Biobase_2.30.0             GenomicRanges_1.22.2       GenomeInfoDb_1.6.1
[13] IRanges_2.4.6              S4Vectors_0.8.5            BiocGenerics_0.16.1        data.table_1.9.6

loaded via a namespace (and not attached):
 [1] whisker_0.3-2        magrittr_1.5         zlibbioc_1.16.0      BiocParallel_1.4.3   lattice_0.20-33      colorspace_1.2-6
 [7] rjson_0.2.15         tools_3.2.2          lambda.r_1.1.7       futile.logger_1.4.1  matrixStats_0.50.1   RColorBrewer_1.1-2
[13] GlobalOptions_0.0.8  futile.options_1.0.0 bitops_1.0-6         dendextend_1.1.2     shape_1.4.2          chron_2.3-47
[19] GetoptLong_0.1.1
jokergoo commented 8 years ago

Your tss is from mouse which doesn't have chr21.

ycl6 commented 8 years ago

Thanks! I must have mixed up the test data with my own dataset.

I encounter an error while running normalizeToMatrix(). It seems my dataset is too large for locfit to handle when smooth = TRUE, but no error when smooth is turned off.

Error in t(apply(mat, 1, function(x) { :
  error in evaluating the argument 'x' in selecting a method for function 't': Error in lfproc(x, y, weights = weights, cens = cens, base = base, geth = geth,  :
  newsplit: out of vertex space

Methylation: GRanges object with 18728638 ranges and 1 metadata column:

Gene: GRanges object with 22012 ranges and 9 metadata columns:

Searching the net I found many suggest increasing the maxk param of locfit, but I don't see a way to pass this param from normalizeToMatrix to locfit

jokergoo commented 8 years ago

Yes, I also met such kind of error with the old version of the package. It is because for some TSS, there are very few CpG sites upstream and downstream. normalizeToMatrix() returns a matrix in which each individual value corresponds to mean methylation for 50bp window by default. If there are only very few CpG sites, there may be too many NA values in corresponding rows. And if you perform smoothing on rows afterwards, it may cause errors.

In the old version, smoothing is only performed by locfit() method, in the newest version, I added some additional check for the fit. If there are too many NA values in some rows, loess() method will be used as an optional fit.

I am not 100% sure it fixed this problem, but at least for my projects, it's fixed.

ycl6 commented 8 years ago

Thanks! I didn't realized there's an updated version of the package. I used v1.1.2 and I don't encounter this error now.

One other thing I notice, when I tried to plot 2 EnrichedHeatmap() together, one with fixed position (i.e. TSS) ± 5 Kb, and one with specified region (i.e. Gene body) ± 10 Kb, the X-axis labels of both plots will be the same as the latter plot (i..e -10Kb, Start, End, 10Kb), even though I specify different axis_name.

ycl6 commented 8 years ago

Is there a way to set order manually, such as using row_order, same as that in Heatmap() of ComplexHeatmap?

jokergoo commented 8 years ago

Thanks for the comments.

  1. this was a bug and now it is fixed.
  2. Also row_order is added into the arguments of EnrichedHeatmap().

Following code is what I used to test:


library(EnrichedHeatmap)

load(paste0(system.file("extdata", "chr21_test_data.RData", package = "EnrichedHeatmap")))
tss = promoters(genes, upstream = 0, downstream = 1)

tss2 = tss
start(tss2) = start(tss) - 1000
end(tss2) = end(tss) + 1000

mat1 = normalizeToMatrix(H3K4me3, tss, value_column = "coverage", 
    extend = 5000, mean_mode = "w0", w = 50)
mat2 = normalizeToMatrix(H3K4me3, tss2, value_column = "coverage", 
    extend = 2000, mean_mode = "w0", w = 50)
EnrichedHeatmap(mat1, axis_name_rot = 90) + EnrichedHeatmap(mat2)

# test row_order
od = hclust(dist(mat1))$order
EnrichedHeatmap(mat1, row_order = od)