jazdrv / dnaTools

GNU General Public License v3.0
1 stars 4 forks source link

aggressive (ambiguous) POS promotions do affect subsequent variant rule processing #23

Open jazdrv opened 6 years ago

jazdrv commented 6 years ago

Here's a case where this variant (BY26835) was processed early. Most of its values were unknowns. And its aggressive promotion affected others ability to move up. And caused split/inconsistency issues for others as well.

This variant had lots of FAIL assign values and 1/1 (and 0/1) genotypes ... which seemingly aren't definitive. And bed ranges for this one -- don't give much information either.

The rules called for aggressive promotion. (when compared to U106)

Plus, it had a negative for one kit ... so, it certainly is affecting subsequent rule processing with its result-set.

How to resolve something like this?

Ideas ...

(1) do prior ambiguous conclusions need to be re-assessed based on new data? [I'm thinking YES]

If so, this has ramifications, ie: if a split is uncovered, for example. should the rule processing unwind a positive ambiguous promotion to accommodate that split? Or what if it's a situation where one variant's pos ambiguous promotions is in conflict with another one's. Who wins in these cases?

Should there be a way to lock an ambiguous promotion to protect it from being dropped (in cases where you don't want it to lose its precedence? And then in turn, be able to unlock as well?

Another ramification: in rules processing -- there likely needs to be better messaging (especially in the manual version) ... where it can tell you, such and such variants with ambiguous POS's are keeping this variant from being promoted further. Would you like to override them? Etc?

And yet another ramification: if we need to have the ability to roll back on a POS ambiguous decision, then the rollback needs to reflect other decisions that were made based on that one. And in turn, decisions that were made based on those secondary decisions. And so on.

Perhaps what is needed is to keep track of two matrixes. One -- where stuff that doesn't need to be promoted, doesn't get moved up. It stays unchanged as unk. And stuff that does need to change does get moved. The 2nd matrix -- would be for presentation only, it would attempt to do these ambiguous promotions. But the ambiguous promotions wouldn't be used to make any further decisions. As it concerns a conflict of positive ambiguous promotions (in the 2nd sort of matrix) btw two variants -- in these cases, both would go to NEG. Perhaps even NEG ambiguous like what I mention below.

(2) does there need to be some sort of logging, so we can assess the history of why variant results were concluded to be POS and/or NEG across their original discovered unknown values? And what variants changed them? [I'm thinking YES]

(3) Is there something more we can be doing with the VCF's to discern some of the unknown values?

(4) Should we be doing something about the order of processing imperfect variants? ie: variants that are well known, higher up, and with more known details, are done first. Lesser known variants and those with less information against them, last. [Again, I'm thinking yes]

(5) Should there also be some sort of flag in the visual format of the matrix ... to indicate when there are POS ambiguous values set? That way -- it's easier to discern issues.

(6) Lastly, are there ever NEG ambiguous situations? I was thinking that the aftermath assignment of NEG's to splits might be like that. And if so, should they similarly be assessed against the suggestions and thinking stated above?

+---------------------------------------------------------------------------------------------------------------------------------------------------------+
| c       p        v           510668  293507  521793  163973  120386  22654  N4826  122898  B30884  124134  637069  499807  5962  224096  216600  177312 |
+---------------------------------------------------------------------------------------------------------------------------------------------------------+
|                                0       1       2       3       4       5      6      7       8       9       10      11     12     13      14      15   |
|                                                                                                                                                         |
| 1    56832889    BY26835       1       1       1       1       1       1      1      1       1       1       1       1      1      1                    |
+---------------------------------------------------------------------------------------------------------------------------------------------------------+
+------------------------------------------------------------------------------------------------------------+
| vix  build  name       id      pos     anc  der  dupeP  nouse  pID   kix   kit    assign  geno  bed  mxval |
+------------------------------------------------------------------------------------------------------------+
|  1   hg38   BY26835  461922  56832889   A    C     N      -     4     0   510668    -1    1/1    1     1   |
|  1   hg38   BY26835  461922  56832889   A    C     N      -     11    1   293507    -1    1/1    1     1   |
|  1   hg38   BY26835  461922  56832889   A    C     N      -     13    2   521793    -1    0/1    1     1   |
|  1   hg38   BY26835  461922  56832889   A    C     N      -    867    3   163973    1     1/1    1     1   |
|  1   hg38   BY26835  461922  56832889   A    C     N      -    1040   4   120386    -1    0/1    1     1   |
|  1   hg38   BY26835  461922  56832889   A    C     N      -    1799   5   22654     -1    0/1    1     1   |
|  1   hg38   BY26835  461922  56832889   A    C     N      -    1048   6   N4826     -1    1/1    1     1   |
|  1   hg38   BY26835  461922  56832889   A    C     N      -     12    7   122898    -1    0/1    1     1   |
|  1   hg38   BY26835  461922  56832889   A    C     N      -     1     8   B30884    -1    1/1    1     1   |
|  1   hg38   BY26835  461922  56832889   A    C     N      -    1237   9   124134    1     1/1    0     1   |
|  1   hg38   BY26835  461922  56832889   A    C     N      -     2    10   637069    -1    1/1    1     1   |
|  1   hg38   BY26835  461922  56832889   A    C     N      -     5    11   499807    1     1/1    0     1   |
|  1   hg38   BY26835  461922  56832889   A    C     N      -    1745  12    5962     -1    1/1    1     1   |
|  1   hg38   BY26835  461922  56832889   A    C     N      -    1320  13   224096    1     1/1    1     1   |
|  1   hg38   BY26835  461922  56832889   A    C     N      -    1705  14   216600   None   None   1    -1   |
|  1   hg38   BY26835  461922  56832889   A    C     N      -     3    15   177312    -1    1/1    1    -1   |
+------------------------------------------------------------------------------------------------------------+
moregubbins commented 6 years ago

I've investigated a similar issue and placed it on Slack (search BY26105). The problem comes down to calls which are not well defined in tests, partly due to being in bad locations in the chromosome and potentially having incorrect estimates for mapping quality (MQ; which would give them a higher quality than they should have). The best solution may be to allow the user to provide an input set of priority mutations, with the implication that this is kept to a minimum (e.g. M269, L151, P312, U106) to provide a framework for fixing bad calls. These would be dealt with first, before moving on to the other "perfect" variants.

jazdrv commented 6 years ago

created issue https://github.com/jazdrv/dnaTools/issues/29 ... an idea that could be used to help solve the aggressive (ambiguous) POS promotions issue.