jokergoo / EnrichedHeatmap

make enriched heatmap which visualizes the enrichment of genomic signals to specific target regions.
http://jokergoo.github.io/EnrichedHeatmap/
Other
186 stars 25 forks source link

Filtering matrices by range name #67

Closed jantusan closed 2 years ago

jantusan commented 2 years ago

Thank you for the useful package!

I wonder if there is a way to filter a normalizedMatrix object by the names of the ranges that were used to make it.

In my case I have a GRanges object with named ranges that looks like this

> genes_100
GRanges object with 100 ranges and 2 metadata columns:
        seqnames        ranges strand |          ID      symbol
           <Rle>     <IRanges>  <Rle> | <character> <character>
    [1]     Chr1     3631-5899      + |   AT1G01010      NAC001
    [2]     Chr1     6788-9130      - |   AT1G01020        ARV1
    [3]     Chr1   11101-11372      + |   AT1G03987        <NA>
    [4]     Chr1   11649-13714      - |   AT1G01030        NGA3
    [5]     Chr1   23121-31227      + |   AT1G01040        DCL1
    ...      ...           ...    ... .         ...         ...
   [96]     Chr1 269792-270859      - |   AT1G01725        <NA>
   [97]     Chr1 270067-270517      + |   AT1G04047        <NA>
   [98]     Chr1 270797-272189      + |   AT1G01730        <NA>
   [99]     Chr1 272111-274930      - |   AT1G01740        BSK4
  [100]     Chr1 275188-276310      + |   AT1G01750       ADF11
  -------
  seqinfo: 7 sequences from an unspecified genome; no seqlengths

but after computing the matrix, the names seem to be lost

> attributes(mat)
$dim
[1] 100 150

$upstream_index
 [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42
[43] 43 44 45 46 47 48 49 50

$target_index
 [1]  51  52  53  54  55  56  57  58  59  60  61  62  63  64  65  66  67  68  69  70  71  72  73  74  75  76  77  78  79  80  81  82
[33]  83  84  85  86  87  88  89  90  91  92  93  94  95  96  97  98  99 100

$downstream_index
 [1] 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132
[33] 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150

$extend
[1] 2000 2000

$smooth
[1] FALSE

$signal_name
[1] "signal"

$target_name
[1] "target"

$target_is_single_point
[1] FALSE

$background
[1] NA

$signal_is_categorical
[1] FALSE

$dimnames
$dimnames[[1]]
NULL

$dimnames[[2]]
  [1] "u1"  "u2"  "u3"  "u4"  "u5"  "u6"  "u7"  "u8"  "u9"  "u10" "u11" "u12" "u13" "u14" "u15" "u16" "u17" "u18" "u19" "u20" "u21"
 [22] "u22" "u23" "u24" "u25" "u26" "u27" "u28" "u29" "u30" "u31" "u32" "u33" "u34" "u35" "u36" "u37" "u38" "u39" "u40" "u41" "u42"
 [43] "u43" "u44" "u45" "u46" "u47" "u48" "u49" "u50" "t1"  "t2"  "t3"  "t4"  "t5"  "t6"  "t7"  "t8"  "t9"  "t10" "t11" "t12" "t13"
 [64] "t14" "t15" "t16" "t17" "t18" "t19" "t20" "t21" "t22" "t23" "t24" "t25" "t26" "t27" "t28" "t29" "t30" "t31" "t32" "t33" "t34"
 [85] "t35" "t36" "t37" "t38" "t39" "t40" "t41" "t42" "t43" "t44" "t45" "t46" "t47" "t48" "t49" "t50" "d1"  "d2"  "d3"  "d4"  "d5" 
[106] "d6"  "d7"  "d8"  "d9"  "d10" "d11" "d12" "d13" "d14" "d15" "d16" "d17" "d18" "d19" "d20" "d21" "d22" "d23" "d24" "d25" "d26"
[127] "d27" "d28" "d29" "d30" "d31" "d32" "d33" "d34" "d35" "d36" "d37" "d38" "d39" "d40" "d41" "d42" "d43" "d44" "d45" "d46" "d47"
[148] "d48" "d49" "d50"

$class
[1] "normalizedMatrix" "matrix"      

Are the names there somewhere? or I should name them again in case I want to filter by the names?

Thanks in advance!

jantusan commented 2 years ago

Sorry, I just realised I was using an GRanges object that was actually unnamed...

It turns out to be trivial to filter by name the matrix, I leave the code below for future reference

If your GRanges is named, like this:

> genes_100
GRanges object with 100 ranges and 2 metadata columns:
            seqnames        ranges strand |          ID      symbol
               <Rle>     <IRanges>  <Rle> | <character> <character>
  AT1G01010     Chr1     3631-5899      + |   AT1G01010      NAC001
  AT1G01020     Chr1     6788-9130      - |   AT1G01020        ARV1
  AT1G03987     Chr1   11101-11372      + |   AT1G03987        <NA>
  AT1G01030     Chr1   11649-13714      - |   AT1G01030        NGA3
  AT1G01040     Chr1   23121-31227      + |   AT1G01040        DCL1
        ...      ...           ...    ... .         ...         ...
  AT1G01725     Chr1 269792-270859      - |   AT1G01725        <NA>
  AT1G04047     Chr1 270067-270517      + |   AT1G04047        <NA>
  AT1G01730     Chr1 270797-272189      + |   AT1G01730        <NA>
  AT1G01740     Chr1 272111-274930      - |   AT1G01740        BSK4
  AT1G01750     Chr1 275188-276310      + |   AT1G01750       ADF11
  -------
  seqinfo: 7 sequences from an unspecified genome; no seqlengths

and you made a matrix with normalizeToMatrix() like that:

> mat
Normalize signal to target:
  Upstream 2000 bp (50 windows)
  Downstream 2000 bp (50 windows)
  Include target regions (50 windows)
  100 target regions

you can filter it like a normal matrix

>mat[c("AT1G01010", "AT1G01020", "AT1G03987"),]
Normalize signal to target:
  Upstream 2000 bp (50 windows)
  Downstream 2000 bp (50 windows)
  Include target regions (50 windows)
  3 target regions
jokergoo commented 2 years ago

Sure! I will support it. That is a nice suggestion!

jokergoo commented 2 years ago

Ah. I think it is already supported.

jantusan commented 2 years ago

Yes, it is already supported, I was just doing it wrong 😅.

My case use it to make matrices for a lot of epigenetic marks for all the genes in the genome, and then be able to subset from that instead of making the matrices from scratch. I know you can subset by index but I always fear I will mess up the order of something in my objects and get it wrong, so I prefer names.

jokergoo commented 2 years ago

Yes, that is a safer solution :)