mut.to.sigs.input out of memory error

Hello I've detected an extrange behaviour at mut.to.sigs.input function, this behaviour generates an out of memory error (even with 54GB!) when parsing big files at beep loop.

This is the affected code:

  for (i in unique(mut[, sample.id])) {
    tmp = subset(mut, mut[, sample.id] == i) #Failing line
    beep = table(tmp$tricontext)
    for (l in 1:length(beep)) {
      trimer = names(beep[l])
      if (trimer %in% all.tri) {
        final.matrix[i, trimer] = beep[trimer]
      }
    }
  }

What I've seen is, when I was going to execute the substep line the size of selected rows was squared. For example, when perorming a subset of 100 samples ( and 10 columns), the tmp matrix dimensions were 10000x10 (!) instead of expected 100x10 one.

I've checked 3 different ways to perform the same operation and in all the behaviour is the expected. I suggest you could try to implement "tmp2" or "tmp4" solutions.


    i= 'PDX102.bam'    

    tmp = subset(mut, mut[, sample.id] == i)
    tmp2 =  mut[mut[,sample.id] == i,]    
    tmp3 = subset(mut, c(rep(TRUE,100)))
    inSubset = mut[, sample.id] == i
    tmp4 = subset(mut, inSubset)

    dim(tmp)   # 10000    10
    dim(tmp2)  # 100    10
    dim(tmp3)  # 100    10
    dim(tmp4)  # 100    10

Thank you!

raerose01 / deconstructSigs

mut.to.sigs.input out of memory error #39