yancychy / DiffChIPL

Apache License 2.0
4 stars 0 forks source link

Issue adapting your example code to own data. #2

Closed tolender closed 11 months ago

tolender commented 1 year ago

Hi, I am trying to get your example code to work on my own data. I am getting this error. Any idea what is wrong?

> resA = DiffChIPL(cpmD, design0, group0 = group ) Error in model.frame.default(formula = y ~ x + w, drop.unused.levels = TRUE) : variable lengths differ (found for 'x')

Thanks!

yancychy commented 1 year ago

Hi, it seems the data sample number are different in either cpmD, design0, or group.

tolender commented 1 year ago

Hi, I used your example code exactly with my samples, with 2xCTRL and 2xTreatment. I even renamed the ctrl and treatment to sim1/sim2. Is this matrix correct for design0?

sim2 sim1 [1,] 1 1 [2,] 1 1 [3,] 1 0 [4,] 1 0

yancychy commented 1 year ago

Maybe it is caused by NA value in the data. Please remove the rows which contain NA values. https://stackoverflow.com/questions/19771284/error-in-model-frame-default-variable-lengths-differ

tolender commented 1 year ago

I checked over the cpmD array and it does not contain "NA" values. However, there are rows where some columns have "0", and the other columns have "0.000000", but there are no rows where all the rows are zero or "NA".

yancychy commented 1 year ago

Ok. If you could send me the input data, I will try to solve it.

adoe21 commented 1 year ago

I've gotten the same error, and I've traced it back to the filtLowCounts function within the DiffChIPL function. The error occurs when filtLowCounts calls piecewise.linear because the smoothY variable is empty. It appears smoothY is empty due to the line:

smoothY = smoothY[names(sx)]

When running the package on my data, the sx variable has no names resulting in smoothY being empty. What is this names(sx) call doing? If I just get rid of that line, it gets rid of the error, but don't want to be removing a step that may be necessary.

yancychy commented 1 year ago

Thanks @adoe21 . This is because sx has names for each element. We assign a unique name for each peak.

rawid = paste0(peakAll[,1],"_" ,peakAll[,2])
countAll = countL$countAll
rownames(countAll) = rawid

Then the rowname of cpmD will be

> cpmD[1:3,]
           hist1_out_1_sim1 hist1_out_2_sim1 hist1_out_3_sim2 hist1_out_4_sim2
chr1_28491         4.668331         4.921166         5.126321         5.527492
chr1_54121         3.464997         3.723300         5.446714         5.365705
chr1_67421         4.934739         4.841502         5.092289         5.577626

sx[1:4] 
chr1_28491 chr1_54121 chr1_67421 chr1_84029 
  5.060828   4.500179   5.111539   5.150445
yancychy commented 1 year ago

This step is essential. Because it is will reorder the values of smoothY to be consistent with sx.

smoothY = smoothY[names(sx)]
adoe21 commented 1 year ago

This fixed it thanks!