markrobinsonuzh / DAMEfinder

Finds DAMEs - Differential Allelicly MEthylated regions
MIT License
10 stars 3 forks source link

detection of ASM regions from one sample #43

Closed weishwu closed 1 year ago

weishwu commented 2 years ago

I wonder if there is a way to make DAMEfinder output ASM regions from one sample only, instead of differential ASM regions by comparing the samples.

sorjuela commented 2 years ago

Hi, you can check out issue #40 to get regions with high ASM in a single sample/file.

weishwu commented 2 years ago

Thanks! I found your code in issue #40

cluster <- bumphunter::clusterMaker(
        chr = as.character(seqnames(se)), 
        pos = start(se))

icr <- bumphunter::regionFinder(x = assay(se, "asm")[,1], 
            chr = as.character(seqnames(se)), 
            pos = start(se), 
        cluster = cluster, 
            cutoff = 1)

The biggest ASM score in my data is 1.992105, so should I change the cutoff parameter in regionFinder to this value? In the documentation of regionFinder it says:

cutoff: This argument is passed to getSegments. It represents the upper (and optionally the lower) cutoff for x.

I wonder what it means exactly by "upper cutoff".

sorjuela commented 2 years ago

Hi sorry for the late response, If you specify the highest number then you probably won't get any regions, since maybe a few have that value. I would keep the cutoff = 1 for the tuple score, or look at the distribution of all your ASM scores to choose a sensible cutoff based on your own data. Typically an ASM (tuple) score above 1 is fine.

weishwu commented 2 years ago

@sorjuela Thanks!

weishwu commented 2 years ago

After I ran the code, 80% of the segments in the output has zero length, like this one:

chr    start      end    value     area cluster indexStart indexEnd L clusterL
chr11  2248105  2248105 1.000001 1.000001  250753    1194910  1194910 1 6

I checked the ASM score data using the index and got:

> ASM_mat[1194910]
class: RangedSummarizedExperiment 
dim: 1 1 
metadata(0):
assays(6): asm cov ... UM UU
rownames(1): chr11.2248105.2248116
rowData names(1): midpt
colnames(1): sample_5554
colData names(0):

So the coordinates for this CpG pair are: chr11.2248105.2248116 Why does the segment only pick up the position of the first CpG in this pair?

sorjuela commented 2 years ago

Hi Weisheng,

So in the code, we are giving the function only the start position:

cluster <- bumphunter::clusterMaker(
        chr = as.character(seqnames(se)), 
        pos = start(se)) #HERE

The function assumes one value per position, but this still corresponds to the tuple. You can also try using the midpoint position mcols(se)$midpt that would be the midpoint between the CpGs in the tuple.

You should also try with lower cutoffs and different values for maxGap to get larger regions. It could also be your ASM values are on the lower range.

weishwu commented 1 year ago

Thanks!