nasa / GeneLab_Data_Processing

65 stars 42 forks source link

[BulkRNASeq] Handling Technical Replicates #32

Open J-81 opened 1 year ago

J-81 commented 1 year ago

Description

Workflow should handle technical replicates appropriately.

Approaches

DESeq2 provides a collapseReplicates function that sums counts based on a factor to group samples by. The rationale has two major points:

  1. Summing opposed to averaging is appropriate for maintaining expected Poisson distribution
  2. DESeq2 is designed to normalize for library size differences. Summing technical replicates is akin to having a higher sequencing depth for a sample.

Implementation Suggested

Encode Technical Replicate Groups in the Runsheet

Encode technical replicates as a column in the runsheet simply using integers for each technical replicate group. Eventually, this technical replicate column should be automatically derived from ISA archive metadata; however, in the meantime, a workflow user should be able to supply a two column csv mapping sample name to technical replicate group which will be incorporated into the runsheet.

Use Technical Replicate Groups Column in Runsheet to for DESeq2 collapseReplicates

https://rdrr.io/bioc/DESeq2/man/collapseReplicates.html

Validation Plan

  1. Validate reasonable approach results as follows:

Run the following approaches

Assessment Metrics:

  1. Regression Test Criteria
J-81 commented 1 year ago

Implementation Steps

J-81 commented 1 year ago

Additional considerations: