rhondabacher / SCnorm

Normalization for single cell RNA-seq data
47 stars 9 forks source link

Subscript errors #4

Closed lazappi closed 7 years ago

lazappi commented 7 years ago

Hi

I have been trying to run SCnorm on some of my data and have been running to some errors with out of bounds subscripts and objects not found.

Here is the output of running SCnorm on the example data in the scater package:

library(scater)
library(SCnorm)

DataNorm <- SCnorm(sc_example_counts, Conditions = rep(1, ncol(sc_example_counts)), OutputName = "MyNormalizedData", PLOT=TRUE, FilterCellNum = 10)
[1] "Gene filter is applied within each condition."
[1] "851 genes were not included in the normalization due to having less than 10 non-zero values."
[1] "A list of these genes can be accessed in output, see vignette for example."
Error in SCnorm(sc_example_counts, Conditions = rep(1, ncol(sc_example_counts)),  : 
  object 'x' not found

If we provide two conditions we run into a different error:

Conditions = rep(c(1,2), each= 20)
DataNorm <- SCnorm(sc_example_counts, Conditions = Conditions, OutputName = "MyNormalizedData", PLOT=TRUE, FilterCellNum = 10)
[1] "Gene filter is applied within each condition."
[1] "1391 genes were not included in the normalization due to having less than 10 non-zero values."
[1] "1419 genes were not included in the normalization due to having less than 10 non-zero values."
[1] "A list of these genes can be accessed in output, see vignette for example."
Error in sreg[[ADDTO]] : subscript out of bounds

I have also seen Error in sreg[[i]] : subscript out of bounds. Do you know why this might be happening or have any suggestions to avoid it?

Thanks

rhondabacher commented 7 years ago

Hi Luke,

Thanks for trying out the package.

The first error I fixed earlier today and should be taken care of if you reinstall. Sorry about that!

The second error is more complicated. SCnorm was designed and tested extensively on entire datasets. The dataset from 'scater' is a subset of only 2,000 genes, and the gene filtering step is first removing the majority of these. Currently SCnorm requires a minimum of 100 genes in each K group, but is not converging to a K within this bound. For now, I don't suggest running with small subsets of data, but I will look further into how to best handle such cases and add any additional error handling messages. Please let me know if you are running into this error on larger datasets.

Thanks again! I appreciate the feedback.

lazappi commented 7 years ago

Thanks for you quick response. It looks like your fix has helped, SCnorm runs on the scater data and my own dataset.

Thanks for the explanation of the other error. That makes sense and would also explain why it takes a long time to run. It would be good if there was a way to detect that was going to happen and provide an error message to the user, but I don't know how easy that would be.

Here are a couple of other things I have noticed, I don't want to tell you what to do with your own software so feel free to ignore them.

norm <- SCnorm(sc_example_counts, rep(1, ncol(sc_example_counts)))
[1] "Gene filter is applied within each condition."
[1] "851 genes were not included in the normalization due to having less than 10 non-zero values."
[1] "A list of these genes can be accessed in output, see vignette for example."
Error in paste0(OutputName, "_k_evaluation.pdf") : 
  argument "OutputName" is missing, with no default
lazappi commented 7 years ago

Sorry to keep bothering you, but I've just run into a similar error. If I don't set the FilterExpression argument I get Error in sreg[[i]] : subscript out of bounds, setting it to something (FilterExpression = 2) fixes this. This argument doesn't seem to be documented so I'm not really sure what it does or what a reasonable value would be.

rhondabacher commented 7 years ago

Hi Luke,

Again, thanks a ton for your comments. I have fixed the "subscript out of bounds" error in that function and have been adding more error messages as well. I appreciate the message() versus print() tip.