smithlabcode / methpipe

A pipeline for analyzing DNA methylation data from bisulfite sequencing.
http://smithlabresearch.org/methpipe
66 stars 27 forks source link

Question about design matrix #165

Closed eribue closed 2 years ago

eribue commented 4 years ago

Hi, I'm hoping to some help/confirmation on how to create a design matrix with more than two groups. In my experimental design, I have two factors (control and treated) each with three levels. Would this be the correct way to layout my design? Thanks, Erika

                                                 base                     case                  

MI3_R1_001_val_1_bismark_bt2_pe.bismark.cov.gz 1 1 MI2_R1_001_val_1_bismark_bt2_pe.bismark.cov.gz 1 1 MI1_R1_001_val_1_bismark_bt2_pe.bismark.cov.gz 1 1 MC3_R1_001_val_1_bismark_bt2_pe.bismark.cov.gz 0 0 MC2_R1_001_val_1_bismark_bt2_pe.bismark.cov.gz 0 0 MC1_R1_001_val_1_bismark_bt2_pe.bismark.cov.gz 0 0 CM3_R1_001_val_1_bismark_bt2_pe.bismark.cov.gz 1 2 CM2_R1_001_val_1_bismark_bt2_pe.bismark.cov.gz 1 2 CM1_R1_001_val_1_bismark_bt2_pe.bismark.cov.gz 1 2 CC3_R1_001_val_1_bismark_bt2_pe.bismark.cov.gz 0 3 CC2_R1_001_val_1_bismark_bt2_pe.bismark.cov.gz 0 3 CC1_R1_001_val_1_bismark_bt2_pe.bismark.cov.gz 0 3 BI3_R1_001_val_1_bismark_bt2_pe.bismark.cov.gz 1 4 BI2_R1_001_val_1_bismark_bt2_pe.bismark.cov.gz 1 4 BI1_R1_001_val_1_bismark_bt2_pe.bismark.cov.gz 1 4 BC3_R1_001_val_1_bismark_bt2_pe.bismark.cov.gz 0 5 BC2_R1_001_val_1_bismark_bt2_pe.bismark.cov.gz 0 5 BC1_R1_001_val_1_bismark_bt2_pe.bismark.cov.gz 0 5

guilhermesena1 commented 2 years ago

Hi,

I'm really sorry we never reached out with the answer, but if it's at all helpful,

The way to handle categorical variables in radmeth is to create a binary column for all possible values of the categorical variable. For instance, if have a treatment that takes three values, create three columns called used_treatment_0, used_treatment_1 and used_treatment_2, where the value is 1 for the samples that used the corresponding treatment. Radmeth will not "understand" the values of 0 to 5 and will likely report that the columns should be binary outcomes. This is the way around it.

andrewdavidsmith commented 2 years ago

Closing due to inactivity.