Closed AngCamp closed 1 year ago
Your pred column in the list dkkl1_nebula_g$pred
should not contain the model matrix. Within dkkl1_nebula_g$pred
, you should only have predictors associated with each of the cells which you use to build dkkl1.nebula.df
i.e. metadata from the original object. If your original object was a Seurat object for example, your predictors would just be dkkl1_nebula_g$pred <- seurat_object$predictor
, then build your model matrix from the dkkl1_nebula_g$pred
.
Hi AngCam,
I'm not sure why my previous reply four days ago does not show up on this thread.
I think the error is in dkkl1_nebula_g$sid when used as an input for nebula. It should be dkkl1_nebula_g$id.
Best regards,
Liang
On Thu, Jun 15, 2023 at 2:09 AM AngCamp @.***> wrote:
I created a list like the sample_data you provide, with the model matrix. Here is its structure....
str(dkkl1_nebula_g)
List of 4 $ count :Formal class 'dgCMatrix' [package "Matrix"] with 6 slots .. ..@ i : int [1:8465852] 2 3 7 9 11 14 17 20 21 23 ... .. ..@ p : int [1:1165] 0 7897 16819 24435 32635 40432 48513 55924 60459 64383 ... .. ..@ Dim : int [1:2] 23355 1164 .. ..@ Dimnames:List of 2 .. .. ..$ : chr [1:23355] "00R-AC107638.2" "0610005C13Rik" "0610007P14Rik" "0610009B22Rik" ... .. .. ..$ : chr [1:1164] "B1_T6_K7_S83_mouse1" "D6_T3_H15_S91_mouse1" "E3_T6_A10_S146_mouse1" "B7_T6_A8_S144_mouse1" ... .. ..@ x : num [1:8465852] 57 35 1 48 42 13 2 17 103 14 ... .. ..@ factors : list() $ id : num [1:1164] 1 1 1 1 1 1 1 1 1 1 ... $ pred : num [1:1164, 1:9] 1 1 1 1 1 1 1 1 1 1 ... ..- attr(*, "dimnames")=List of 2 .. ..$ : chr [1:1164] "1" "2" "3" "4" ... .. ..$ : chr [1:9] "(Intercept)" "ConditionContext-Only:LabeltdT+" "ConditionFear-Only:LabeltdT+" "ConditionFear-Recall:LabeltdT+" ... $ offset: num [1:1164] 1 1 1 1 1 1 1 1 1 1 ...
I have grouped it with group_cell(), but for some reason it does not recognize that the cell names are provided and that the sample id's are the same length as the number of columns (cells) in the data. What am I doing wrong?
Running nebula on the list above produces this error:
results.dkkl1.nebula <- nebula(dkkl1_nebula_g$count, dkkl1_nebula_g$sid, pred=dkkl1_nebula_g$pred, ncore=2)
Error message:
Error in nebula(dkkl1_nebula_g$count, dkkl1_nebula_g$sid, pred = dkkl1_nebula_g$pred, : The length of subject IDs should be equal to the number of columns of the count matrix. Traceback:
- nebula(dkkl1_nebula_g$count, dkkl1_nebula_g$sid, pred = dkkl1_nebula_g$pred, . ncore = 2)
- stop("The length of subject IDs should be equal to the number of columns of the count matrix.")
— Reply to this email directly, view it on GitHub https://github.com/lhe17/nebula/issues/25, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGDISUTRQ5PRP4YQXJ6P72DXLJHEXANCNFSM6AAAAAAZHCLSWE . You are receiving this because you are subscribed to this thread.Message ID: @.***>
Thanks I will try these things out.
Thanks these two solutions fixed it. I think it's worth noting that that it's a little unnecessarily confusing that you use data$sid
in your tutorial. Also I know most people will probably use a Seurat object but it may be useful for you to provide an explanation for people working with standard csv's how to make an object that works with your package. Most data on GEO as well is stored as a .csv so often people working with publicly available data won't be using sparse matrices, at least not to do simple preprocessing like gene filtering.
I did the following:
# create counts for cell type(s) of interest, do gene filtering first
# in my case this gave me a dataframe called dkkl1.counts.df
# this can now be made into the counts matrix
dkkl1_nebula <- vector(mode = "list", length = 4)
dkkl1_nebula$count <- Matrix(as.matrix(dkkl1.counts.df ),sparse=TRUE)
dim(dkkl1_nebula$count)
dkkl1_nebula$count[1:5,1:5]
233551164
5 x 5 sparse Matrix of class "dgCMatrix"
B1_T6_K7_S83_mouse1 D6_T3_H15_S91_mouse1 E3_T6_A10_S146_mouse1
00R-AC107638.2 . . .
0610005C13Rik . . .
0610007P14Rik 57 13 6
0610009B22Rik 35 27 32
0610009E02Rik . . .
B7_T6_A8_S144_mouse1 B4_T8_I19_S47_mouse1
00R-AC107638.2 . .
0610005C13Rik . .
0610007P14Rik 116 26
0610009B22Rik 76 .
0610009E02Rik . 6
Just a suggestion, could save a user some googling. Many of your users are also going to be biologists (like me) with limited programming experience and may not be familiar with sparse matrices. Might increase the user base if you can save them time with little things like this. Idiot proofing the tutorial for people like me can go a long way.
It may help to add a small paragraph to the tutorial just explaining the object nebula is expecting, I know it's easy to deduce by simply running str(sample_data)
and by reading the documentation of the functions but it's easy to miss little things if they are not explicitly spelled out. A short paragraph could save a user a lot of time trawling through your documentation, arguably unnecessarily, since it would be quite easy to explain. Also just to reiterate, many users are going to be biologists with limited programming experience. It will not occur to them to do the things I listed above. Seurat has a wide user base not just because it is the "best" package, arguably it is not, but it does have the best tutorials. Users can easily pick the package up and learn to use it.
Thanks for the help =) btw, its appreciated.
Hi AngCamp,
Thank you for your suggestions. They will be considered in updated versions.
Best regards, Liang
On Fri, Jun 23, 2023 at 8:19 PM AngCamp @.***> wrote:
It may help to add a small paragraph to the tutorial just explaining the object nebula is expecting, I know it's easy to deduce by simply running str(sample_data) and by reading the documentation of the functions but it's easy to miss little things if they are not explicitly spelled out. A short paragraph could save a user a lot of time trawling through your documentation, arguably unnecessarily, since it would be quite easy to explain. Also just to reiterate, many users are going to be biologists with limited programming experience. It will not occur to them to do the things I listed above. Seurat has a wide user base not just because it is the "best" package, arguably it is not, but it does have the best tutorials. Users can easily pick the package up and learn to use it.
— Reply to this email directly, view it on GitHub https://github.com/lhe17/nebula/issues/25#issuecomment-1604671200, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGDISUVO62MQE3W3PBKIC7DXMXMZJANCNFSM6AAAAAAZHCLSWE . You are receiving this because you commented.Message ID: @.***>
I created a list like the sample_data you provide, with the model matrix. Here is its structure....
>str(dkkl1_nebula_g)
I have grouped it with group_cell(), but for some reason when I run nebula on it, it does not recognize that the sample id's are the same length as the number of columns (cells) in the data. What am I doing wrong? The only difference I see between my object and your sample_data object is that mine contains the cell names.
Running nebula on the list above produces this error:
Error message:
EDIT Not sure if this could also be the issue but the model I am tyring to fit is as follows:
dkkl1.nebula.df = model.matrix(~Condition:Label, data=dkkl1_nebula$pred)