ohsu-comp-bio / tcrseq_normalization

0 stars 0 forks source link

Determine independence of primer bias #8

Open weshorton opened 8 years ago

weshorton commented 8 years ago

Summary

In PCR1, we use a multiplex reaction of 20 forward and 13 reverse primers. Each primer has a different amplification rate during the reaction. Is the amplification rate of a particular forward primer dependent on the identity of the reverse primer, or is it constant for all reverse primers?

Significance

If primer amplification bias is independent, we need to calculate 33 ei factors rather than 260.

To Do

  1. Control Experiment
  2. Determine independence via analysis (see Approach below)
  3. Make recommendation based on results of (2).

    Approach

  4. Control Experiment (same as experiment in #7)
    • Samples containing:
      • Constant spike concentration (To Do: what is this value?)
      • NO gDNA
    • 20 total samples
  5. Data Analysis and Interpretation
  6. Per discussion on 25 May 2016, we will proceed with both a linear regression analysis, as well as the chi-squared analysis. Once both analyses are complete, we will compare and discuss results.
    • Boxplots
      • grouped by V primer (20)
      • grouped by J primer (13)
    • Linear Regression
      • additive vs. multiplicative model (test for interactions)
      • TapeStation dilution factor as co-variate
      • PCR batch as co-variate
      • Summarization of Linear Regression analysis results
    • Chi squared independence test (From Burcu's and Terry's interchange)
      • Chi square independence test on a VxJ matrice (13X20) There will be 260 cells in the matrice.
      • Test the contribution of each cell by looking at the Pearson Residuals as well. Start with a few samples for which the counts are reasonably large (relative to the whole data set), and then form the 13x20 array of counts for each separately. For each of these, calculate expected E under the null hypothesis of independence

Additional details on the chi-squared approach:

E for a cell = row total x column total / grand total.

Letting O be the observed count in a cell, the 12x19 df. chi-squared statistic is sum over the 260 cells [(O-E)^2/E]. We are less interested in this and more in looking at the tables of Pearson residuals (O-E)/sqrt(E). Check whether there are patterns which suggest important deviations from independence. We don't really want to test the formal independence hypothesis, we want to know whether the deviations from independence are a) sufficiently large as to require paying attention, and b) interpretable, e.g. restricted to one or a few Fs or Rs.

weshorton commented 8 years ago

To Do

  1. Create Milestone
  2. Formally document decision
    • chi squared
    • regression (find and export significant combinations from regression)
    • model usage - mixed, independent, dependent