imkeller / gscreend

Phenotype detection in pooled genetic perturbation screens
11 stars 3 forks source link

missing `gene` column -> "Error in .subset2(x, i, exact = exact): subscript out of bounds" #1

Closed jan-glx closed 4 years ago

jan-glx commented 4 years ago

If the rowData of the SummarizedExperiment does not contain a gene column, gscreend::RunGscreend fails with a non-helpful error message. Perhaps its possible to completely refrain from using such a column? -- gscreend::ResultsTable could just add the fdr, pval and lfc columns to rowData (which is probably what most users any do after anyways).

library(gscreend)
raw_counts <- read.table(
                        system.file('extdata', 'simulated_counts.txt',
                        package = 'gscreend'),
                        header=TRUE)

counts_matrix <- cbind(raw_counts$library0, raw_counts$R0_0, raw_counts$R1_0)

rowData <- data.frame(
  sgRNA_id = raw_counts$sgrna_id,
  gene_name = raw_counts$Gene # only works with `gene =`, error message is not helpful
)

colData <- data.frame(samplename = c('library', 'R1', 'R2'),
timepoint = c('T0', 'T1', 'T1'))

library(SummarizedExperiment)
se <- SummarizedExperiment(assays=list(counts=counts_matrix),
rowData=rowData, colData=colData)

# create a PoolScreenExp experiment
pse <- createPoolScreenExp(se)
#> Creating PoolScreenExp object from a SummarizedExperiment object.
#> References and samples are named correctly.

# Run Analysis
pse_an <- gscreend::RunGscreend(pse)
#> Size normalized count data.
#> Calculated LFC.
#> Fitted null distribution.
#> Calculated p-values at gRNA level.
#> Ranking genes...
#> Error in .subset2(x, i, exact = exact): subscript out of bounds
imkeller commented 4 years ago

The gene column is necessary, because multiple sgRNAs target one gene. In order to compute the gene level statistics, sgRNA data needs to be aggregated on gene level. The rowData for the sgRNA table is not the same as for the gene table. The gene table is much shorter.

imkeller commented 4 years ago

The data input functions now raises an error if the columns are not named correctly.

jan-glx commented 4 years ago

Oh yes! Should have stopped after the stupid-user-complain (thanks for fixing!) and not give precocious advice.

imkeller commented 4 years ago

Thanks for the user feedback :) I will also try to make it clearer in the vignette. It might take some time until changes appear on Bioconductor.