raerose01 / deconstructSigs

deconstructSigs
138 stars 47 forks source link

mut.to.sigs.input out of memory error #39

Closed lpalomerol closed 5 years ago

lpalomerol commented 6 years ago

Hello I've detected an extrange behaviour at mut.to.sigs.input function, this behaviour generates an out of memory error (even with 54GB!) when parsing big files at beep loop.

This is the affected code:

  for (i in unique(mut[, sample.id])) {
    tmp = subset(mut, mut[, sample.id] == i) #Failing line
    beep = table(tmp$tricontext)
    for (l in 1:length(beep)) {
      trimer = names(beep[l])
      if (trimer %in% all.tri) {
        final.matrix[i, trimer] = beep[trimer]
      }
    }
  }

What I've seen is, when I was going to execute the substep line the size of selected rows was squared. For example, when perorming a subset of 100 samples ( and 10 columns), the tmp matrix dimensions were 10000x10 (!) instead of expected 100x10 one.

I've checked 3 different ways to perform the same operation and in all the behaviour is the expected. I suggest you could try to implement "tmp2" or "tmp4" solutions.


    i= 'PDX102.bam'    

    tmp = subset(mut, mut[, sample.id] == i)
    tmp2 =  mut[mut[,sample.id] == i,]    
    tmp3 = subset(mut, c(rep(TRUE,100)))
    inSubset = mut[, sample.id] == i
    tmp4 = subset(mut, inSubset)

    dim(tmp)   # 10000    10
    dim(tmp2)  # 100    10
    dim(tmp3)  # 100    10
    dim(tmp4)  # 100    10

Thank you!

raerose01 commented 5 years ago

Hi, thanks for pointing that out. I think that was fixed in this commit. https://github.com/raerose01/deconstructSigs/commit/9bbaf15387e1a6221b4437523d12dd950eea80e1