r3fang / SnapATAC

Analysis Pipeline for Single Cell ATAC-seq
GNU General Public License v3.0
300 stars 125 forks source link

Promoter Ratio Bug? #57

Open AlexChitsazan opened 5 years ago

AlexChitsazan commented 5 years ago

I was trying to calculate the promoter ratio and I noticed that features can be counted twice if more than one promoter fall into the overlap. I was getting promoter ratios of over 1 which doesn't make sense. Here is the code where it is calculated per two examples (Fang_2019 & 10X_15k)

> promoter.df = read.table("promoter.bed");
> promoter.gr = GRanges(promoter.df[,1], IRanges(promoter.df[,2], promoter.df[,3]));
> ov = findOverlaps(x.sp@feature, promoter.gr);
> idy = queryHits(ov);
> promoter_ratio = SnapATAC::rowSums(x.sp[,idy, mat="bmat"], mat="bmat") / SnapATAC::rowSums(x.sp, mat="bmat");

When I run this on my code, running unique makes a substantial difference:

> length(idy)
[1] 80326
> length(unique(idy))
[1] 70224

I also noticed that when you plot promoter ratio that you have ylim(0,1), which for a true percentage wouldn't seem necessary to include, just curious if there is a reason for this?

IMJoeyZhu commented 5 years ago

Do you figure it out? It seems like my data have the same problem.

> length(unique(idy))
[1] 37657
> length(idy)
[1] 66435

And also I noticed that the value of promoter_ratio is abnormal.

> summary(promoter_ratio)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
0.06484 0.42483 0.55636 0.58979 0.73246 1.55439

I am still wondering how could it be, and how to solve it.

r3fang commented 5 years ago

Hi can you show me the plot logUMI and promoter ratio plot

Rongxin Fang, Ren Lab Ludwig Cancer Research Bioinformatics Ph.D. Student University of California, San Diego

On Aug 5, 2019, at 5:35 AM, IMJoeyZhu notifications@github.com wrote:

Do you figure it out? It seems like my data have the same problem.

length(unique(idy)) [1] 37657 length(idy) [1] 66435 And also I noticed that the value of promoter_ratio is abnormal.

summary(promoter_ratio) Min. 1st Qu. Median Mean 3rd Qu. Max. 0.06484 0.42483 0.55636 0.58979 0.73246 1.55439 I am still wondering how could it be, and how to solve it.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/r3fang/SnapATAC/issues/57?email_source=notifications&email_token=ABT6GG7E2VJCBQVZYC6PCO3QDANAPA5CNFSM4H7IAMW2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD3RVSSA#issuecomment-518216008, or mute the thread https://github.com/notifications/unsubscribe-auth/ABT6GGZB4YW5BYIF6K7V4I3QDANAPANCNFSM4H7IAMWQ.

yxian9 commented 4 years ago

@AlexChitsazan @IMJoeyZhu

One genomic bin (5k) may overlap with multiple gene promoter regions. use unique(idy) will remove the duplicated genomic bin. Then the promoter ratio will be normal.