yancychy / DiffChIPL

Apache License 2.0
4 stars 0 forks source link

DiffChIPL run error #5

Open junjunlab opened 2 months ago

junjunlab commented 2 months ago

Hi, thanks for developing this great package, I got an error when I run DiffChIPL, can you give some advice? Thank you!

> resA = DiffChIPL(cpmD, design0, group0 = group)
Error in model.frame.default(formula = y ~ x + w, drop.unused.levels = TRUE) : 
  variable lengths differ (found for 'x')
yancychy commented 2 months ago

Hi, can you check the column names of cpmD, design0, and group? I guess there should be some inconsistency among them.

junjunlab commented 2 months ago

Sure, here is my code:

# SampleID  Tissue  Factor  Condition   Treatment   Replicate   bamReads    ControlID   bamControl  Peaks   PeakCaller
meta <- data.frame(SampleID = rep(c("CTL","siNCoR"),each = 3),
                   Tissue = "",
                   Factor = "atac",
                   Condition = rep(c("control","treat"),each = 3),
                   Treatment = rep(c("control","treat"),each = 3),
                   Replicate = rep(c(1,2,3),2),
                   bamReads = list.files("../2.map-data",pattern = ".shift.sorted.bam$",full.names = T),
                   ControlID = "",
                   bamControl = "",
                   Peaks = list.files(".",pattern = ".narrowPeak",full.names = T),
                   PeakCaller = "narrow")

# output
write.csv(meta,file = "DiffChIPL_meta.csv",row.names = F)

# ==============================================================================
# 1_Get read count 
# ==============================================================================

library(DiffChIPL)

countL = getReadCount(inputF = "DiffChIPL_meta.csv",
                      overlap = FALSE, 
                      fragment = 0,
                      removeBackground = TRUE)

# ==============================================================================
# 2_Make the design matrix and normalization
# ==============================================================================
group= c(1,1,1,0,0,0)
ctrName = "control"
treatName = "treat"
groupName = rep(c(treatName, ctrName),each = 3)
design0 <- cbind(rep(1, 6), c(1,1,1,0,0,0))
colnames(design0) <- c(ctrName, treatName)
design0

#      control treat
# [1,]   1      1
# [2,]   1      1
# [3,]   1      1
# [4,]   1      0
# [5,]   1      0
# [6,]   1      0

countAll <- countL$countAll

for(i in 1:ncol(countAll)){
  id = which(countAll[,i] < 1)
  countAll[id,i] = 0
}

cpmD = cpmNorm(countAll, libsize = countL$fd$lsIP)

# ==============================================================================
# 3_Do differential analysis with DiffChIPL
# ==============================================================================
resA = DiffChIPL(cpmD, design0, group0 = group)
fitRlimm3 = resA$fitDiffL
rtRlimm3 = resA$resDE
junjunlab commented 2 months ago

This is my cpmD:

1714107261884

yancychy commented 2 months ago

Can you remove NAs in you cpmD?

junjunlab commented 2 months ago

It seems there is no NA in cpmD:

> table(is.na(cpmD))

 FALSE 
867678 
junjunlab commented 2 months ago

And this is my sample sheet:

1714206553841

yancychy commented 2 months ago

Would you like to share me the cpmD data?

junjunlab commented 2 months ago

Sure, this is my cpmD data:

cpmD.csv

yancychy commented 2 months ago

Thanks. I updated the function in DiffChIPL. It works now. The number of rows are too many. So your data running time may need 10 minutes.

junjunlab commented 2 months ago

Thank you very much, I will try it!