Multi-subject DE analysis between two groups with one predictor

SirKuikka commented 2 years ago

Hi,

The example shows how to do DE analysis with three predictors. Let's say I have only one predictor, the case,control information, and I would like to find DE genes between the two groups. How would I implement the design matrix then?

Btw, what exactly are these two other predictors?

lhe17 commented 2 years ago

Hi,

Thank you for your question.

In your case, your design matrix will have two columns, the first column is the intercept term (all ones), and the second column has 0 or 1, with one for the cases if you treat the controls as the reference group. The fold change of the second column then will be the effect of the case group.

I may use ‘variables of interest’ instead of ‘predictors’ in the next version to avoid potential misunderstanding. Basically, the design matrix includes the variables of which you want to test the effects and other covariates you want to adjust for.

Best regards,

Liang

From: SirKuikka @.> Sent: Wednesday, September 1, 2021 12:35 PM To: lhe17/nebula @.> Cc: Subscribed @.***> Subject: [lhe17/nebula] Multi-subject DE analysis between two groups with one predictor (#2)

Hi,

The example shows how to do DE analysis with three predictors. Let's say I have only one predictor: the case,control information and I would like to find DE genes between the two groups. How would I implement the design matrix then?

Btw, what exactly are these two other predictors?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/lhe17/nebula/issues/2 , or unsubscribe https://github.com/notifications/unsubscribe-auth/AGDISURPDBIU2RDNMJL5XEDT7ZI2NANCNFSM5DG4EQMQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub . https://github.com/notifications/beacon/AGDISUQ27CDKC2ZW3J6T4BDT7ZI2NA5CNFSM5DG4EQM2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4OV3N5HA.gif

SirKuikka commented 2 years ago

Hi,

Thanks for getting back to me so quickly. I'm trying Nebula with my own data, but I get this error. It's complaining me about the cells of the same subject not being grouped. This of course makes sense, because one subject can have only one group (e.g. case or control, but not both). So have I misunderstood the purpose of the tool or what's going on? What I'm trying to do is I want to perform DE analysis between the two groups so that the mixed model of Nebula models the subjects/individuals as a random effect. `

library(nebula)

dim(raw_data) [1] 6111 38623

raw_data[1:10,1:10] cell1 cell3 cell4 cell5 cell6 cell7 cell8 cell9 cell10 cell11 gene1 0 0 0 0 0 0 0 0 0 0 gene2 0 0 0 0 0 0 0 0 0 0 gene3 0 0 0 0 0 0 0 0 1 0 gene4 0 0 2 0 0 0 0 0 0 0 gene5 0 1 0 0 0 0 0 2 0 0 gene6 0 0 0 0 0 0 0 0 0 0 gene7 0 0 0 1 0 0 0 0 0 0 gene8 0 0 0 0 0 0 0 0 0 0 gene9 0 0 0 0 0 0 0 0 0 0 gene10 0 1 0 0 0 0 1 0 0 1

head(group) [1] B B A B B A Levels: A B length(group) [1] 38623

head(individual) [1] sample17.B sample1.B sample2.A sample15.B sample12.B sample16.A 40 Levels: sample1.A sample10.A sample11.A sample12.A sample13.A ... sample9.B length(individual) [1] 38623

pred <- as.data.frame(group) colnames(pred) <- "cc"

head(pred) cc 1 B 2 B 3 A 4 B 5 B 6 A

df = model.matrix(~cc, data=pred)

head(df) (Intercept) ccB 1 1 1 2 1 1 3 1 0 4 1 1 5 1 1 6 1 0

re = nebula(Matrix::Matrix(raw_data,sparse = TRUE),as.character(individual),pred=df) Error in nebula(Matrix::Matrix(raw_data, sparse = TRUE), as.character(individual), : The cells of the same subject have to be grouped. table(individual,group) group individual A B sample1.A 936 0 sample10.A 958 0 sample11.A 976 0 sample12.A 957 0 sample13.A 964 0 sample14.A 924 0 sample15.A 968 0 sample16.A 948 0 sample17.A 1031 0 sample18.A 935 0 sample19.A 974 0 sample2.A 931 0 sample20.A 1065 0 sample3.A 993 0 sample4.A 1005 0 sample5.A 957 0 sample6.A 937 0 sample7.A 993 0 sample8.A 947 0 sample9.A 958 0 sample1.B 0 977 sample10.B 0 970 sample11.B 0 967 sample12.B 0 989 sample13.B 0 920 sample14.B 0 944 sample15.B 0 916 sample16.B 0 986 sample17.B 0 983 sample18.B 0 931 sample19.B 0 927 sample2.B 0 949 sample20.B 0 988 sample3.B 0 954 sample4.B 0 1021 sample5.B 0 1016 sample6.B 0 945 sample7.B 0 930 sample8.B 0 962 sample9.B 0 991 `

SirKuikka commented 2 years ago

And also, if the gene expression values have to be integers, then that means the input must not be normalized, but just raw counts?

lhe17 commented 2 years ago

Hi,

Yes, the input matrix is raw counts. You can specify your normalization factor (e.g., library size of each cell or other scaling factor) using the parameter ‘offset’.

The error indicates that the cells of a subject in your data are not placed consecutively. To order the cells, you can use the function ‘group_cell’ in the package. Do not forget to also include your normalization factor when you use this function.

More details can be found on the webpage https://github.com/lhe17/nebula .

Best regards,

Liang

From: SirKuikka @.> Sent: Thursday, September 2, 2021 4:54 PM To: lhe17/nebula @.> Cc: lhe17 @.>; Comment @.> Subject: Re: [lhe17/nebula] Multi-subject DE analysis between two groups with one predictor (#2)

And also, if the gene expression values have to be integers, then that means the input must not be not normalized, but instead raw counts?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/lhe17/nebula/issues/2#issuecomment-912048248 , or unsubscribe https://github.com/notifications/unsubscribe-auth/AGDISUSNQLKQMV5KOE5MPZDT77P6BANCNFSM5DG4EQMQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub . https://github.com/notifications/beacon/AGDISURX2SQE5IRMJLJMTM3T77P6BA5CNFSM5DG4EQM2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOGZOMA6A.gif

SirKuikka commented 2 years ago

Hi,

Thank you. Now it works. With regard to the normalization, what would be your recommendation for droplet data (e.g. 10X Chromium)? If the manual uses 1 as the scaling factor for all cells, does that mean this should work well in most cases? And if I want to use the library sizes, then does that mean I just sum the raw counts for each cell and use them as the scaling factors?

lhe17 commented 2 years ago

Hi,

No. In most cases, you should NOT use one as the scaling factor for all cells.

Yes, summing the raw counts for each cell can be a reasonable choice for the normalizing factor.

Best regards,

Liang

From: SirKuikka @.> Sent: Friday, September 3, 2021 6:37 AM To: lhe17/nebula @.> Cc: lhe17 @.>; Comment @.> Subject: Re: [lhe17/nebula] Multi-subject DE analysis between two groups with one predictor (#2)

Hi,

Thank you. Now it works. With regard to the normalization, what would be your recommendation for droplet data (e.g. 10X Chromium)? If the manual uses 1 as the scaling factor for all cells, does that mean this should work well in most cases? And if I want to use the library sizes, then does that mean I just sum the raw counts for each cell and use them as the scaling factors?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/lhe17/nebula/issues/2#issuecomment-912440080 , or unsubscribe https://github.com/notifications/unsubscribe-auth/AGDISUTYCJSCOPFSNZACEILUACQN3ANCNFSM5DG4EQMQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub . https://github.com/notifications/beacon/AGDISUTNT3NWVXDLINC7ALDUACQN3A5CNFSM5DG4EQM2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOGZRLWEA.gif

SirKuikka commented 2 years ago

Hi,

Ok, thank you. No more questions about this!

lhe17 / nebula

Multi-subject DE analysis between two groups with one predictor #2