robinweide / GENOVA

GENome Organisation Visual Analytics
GNU General Public License v3.0
69 stars 15 forks source link

saddle - Error in findInterval(eval(i), stats::quantile(eval(i), qbins, na.rm = TRUE) #263

Closed YiweiNiu closed 3 years ago

YiweiNiu commented 3 years ago

Hi,

When I used the ATAC-seq peaks for compartment_score, the saddle function would throw an error.

>CS_out = compartment_score(hic_100k, bed = ATAC_peaks)
> CS_out
A GENOVA 'CS_discovery' object involving the following 3 experiment(s):
'M0', 'M1', 'M2' at a resolution of 100 kb.

Contains the following slots:
 - compart_scores:  A 27269 x 7 data.frame containing signed compartment scores.

22 centromeres spanning a total of 1220 bins have been ignored.
> saddle_out = saddle(hic_100k,
+                     CS_discovery = CS_out,
+                     bins = 50)
Error in findInterval(eval(i), stats::quantile(eval(i), qbins, na.rm = TRUE),  : 
  'vec' must be sorted non-decreasingly and not contain NAs

And when not using the bed option in compartment_score, the saddle function could run successfully.

> CS_out1 = compartment_score(hic_100k)
> saddle_out = saddle(hic_100k,
                    CS_discovery = CS_out1,
                    bins = 50)
Warning message:
The 'CS_discovery' object is unsigned. It is strongly recommended that comparment scores are signed with `sign_comparmentscore()`.
>
> saddle_out
A GENOVA 'saddle_discovery' object involving the following 3 experiment(s):
'M0', 'M1', 'M2' at a resolution of 100 kb.

Contains the following slots:
 - saddle:  A 81302 x 5 data.table containing quantile-quantile scores for 22 chromosome arms.

And it would report the same error when using sign_compartmentscore for this unsigned CS_discovery object.

> CS_out2 <- sign_compartmentscore(CS_out1, bed = ATAC_peaks)
> saddle_out = saddle(hic_100k,
+                     CS_discovery = CS_out2,
+                     bins = 50)
Error in findInterval(eval(i), stats::quantile(eval(i), qbins, na.rm = TRUE),  : 
  'vec' must be sorted non-decreasingly and not contain NAs

I am using the dev version of GENOVA.

Thanks in advance.

teunbrand commented 3 years ago

Hello,

Yes this sounds like a bug in sign_compartmentscore, but I'm unsure how to replicate this bug. I'll have a closer look at what might go wrong when I have some more time. If we could replicate the bug with, for example, the test data, that would be very helpful.

YiweiNiu commented 3 years ago

Hi,

Thank you for your quick reply.

I manually ran each step of the function saddle(), and I found it was due to NA values in the scores.

> sum(is.na(scores$M0))
[1] 0
> sum(is.na(scores$M1))
[1] 107
> sum(is.na(scores$M2))
[1] 107

> scores[scores$part == 'chrYp',]
     chrom    start      end   bin M0 M1 M2  part
  1:  chrY        0   100000 26352  0 NA NA chrYp
  2:  chrY   100000   200000 26353  0 NA NA chrYp
  3:  chrY   200000   300000 26354  0 NA NA chrYp
  4:  chrY   300000   400000 26355  0 NA NA chrYp
  5:  chrY   400000   500000 26356  0 NA NA chrYp
 ---                                             
103:  chrY 10200000 10300000 26454  0 NA NA chrYp
104:  chrY 10300000 10400000 26455  0 NA NA chrYp
105:  chrY 10400000 10500000 26456  0 NA NA chrYp
106:  chrY 10500000 10600000 26457  0 NA NA chrYp
107:  chrY 10600000 10700000 26458  0 NA NA chrYp

It is because that Line 111 of saddle.R removes NA values only from the first object.

scores <- scores[!is.na(eval(as.symbol(expnames[1]))), ]
YiweiNiu commented 3 years ago

I modified this line.

for (i in expnames) {
  scores <- scores[!is.na(eval(as.symbol(i))), ]
}

And it works now.

teunbrand commented 3 years ago

Hi YiweiNiu,

We've implemented your suggested modification in the current dev version. Thanks for spotting and fixing this bug!

Best, Teun