petercombs / MPRA_selection

0 stars 1 forks source link

Decide what to do when there's not complete saturation of the mutagenesis #1

Open petercombs opened 5 years ago

petercombs commented 5 years ago

For instance, RET has only 2 bases tested at '573:

10      43086572        A       C       98      2972    3663    -0.08   0.1749  RET
10      43086572        A       G       215     7260    9664    -0.02   0.70391 RET
10      43086572        A       T       306     10958   14018   0.02    0.60999 RET
10      43086573        C       A       6       163     185     -0.17   0.47475 RET
10      43086573        C       T       136     4233    5560    -0.06   0.21929 RET
10      43086574        A       C       222     7970    10160   0.01    0.72669 RET
10      43086574        A       G       531     17261   22568   0       0.90982 RET
10      43086574        A       T       373     13178   16714   0.01    0.77223 RET

I'm not sure whether it biases things to skip only the missing G, or if I'd be better off marking that base as bad and dropping it from analyses altogether.

petercombs commented 5 years ago

As a stopgap approach, I am doing something that is not quite right: I am skipping a base in the numerator of the K_{u,d,n} if it doesn't have at least 3 different bases. I should probably also skip it in the denominator, but that would require some pre-filtering, which I am not ready to think about over the weekend.