saezlab / transcriptutorial

This is a tutorial to guide the analysis of RNAseq dataset using footprint based tools such as DOROTHEA, PROGENY and CARNIVAL
https://saezlab.github.io/transcriptutorial/
GNU General Public License v3.0
55 stars 30 forks source link

topTable returns 5 columns, but ttopFormatter expects 6 #36

Closed gabora closed 2 years ago

gabora commented 2 years ago

Hi, I am running 02_diff_analysis.Rmd In the following line I get an error: https://github.com/saezlab/transcriptutorial/blob/3a5ba2dd6e8f52af99d9e30bfb18f64ec20742b4/scripts/02_differential_analysis.Rmd#L84

Error in `[.data.frame`(ttop, , c(7, 1, 2, 3, 4, 5, 6)) : 
  undefined columns selected
gabora commented 2 years ago

Is this because of a change in Limma or what's going on?

whiteorchid commented 2 years ago
ttopFormatter <- function(ttop)
{
  ttop$ID <- row.names(ttop)
  ttop <- ttop[,c(1,2,3,4,5,6)]
  ttop <- ttop[complete.cases(ttop),]
  return(ttop)
}

the function for ttopFormatter may not need the c(7,), depending on your data frame, so putting the above in R, and then running ttop_KOvsWT might work.

brandon-krupczak commented 2 years ago

I experienced this issue as well, not sure if it's a change in how topTable works or if the default inputs aren't suitable for my data, but either way when I run the code topTable only returns 5 columns instead of the expected 6. This means that once ttopFormatter adds an additional column for the gene ID, there are only six columns, so column 7 doesn't exist and ttopFormatter is trying to index out of bounds.

It seems the original intention was to have the gene ID in the first column, so if you want to preserve that intention while eliminating the index out of bounds issue, I think you should instead define:

ttopFormatter <- function(ttop) { ttop$ID <- row.names(ttop) ttop <- ttop[,c(6,1,2,3,4,5)] ttop <- ttop[complete.cases(ttop),] return(ttop) }

This worked for me and allowed me to complete the differential expression analysis, but since I don't know what that missing column was supposed to be I don't know if I'm now missing features in my output.

gabora commented 2 years ago

thank you both! I agree with @brandon-krupczak , the intention was to add the gene ID to the first column and therefore shift the other columns by one.

For some reason the limma changes the columns in its output depending on the type of the input (see issue: https://support.bioconductor.org/p/16598/) . If the input is a matrix, then it does not report the average expression.

I added a fix that works if the number of columns are changing