Open mmaiers-nmdp opened 6 years ago
The following mechanisms must be part of the implementation:
This is the list of defined "quality" metrics in PHYCUS right now:
- DIV_LAMBDA
- DIV_50
- DIV_50_REL
- SAM_SIZE
- SAM_POP
- DIV_PGD
- DIV_HEAVY_TAIL
- RES_TRS_COUNT
- RES_TRS
- RES_SHARE_AMBIG
- RES_MISS_LOCI
- DEV_HWE
- ERR_STD
- ERR_SAMP_80_100
- SUM_FREQ_GAP
- ERR_OFFSET
- LD_MEASURE
- KFOLD_IMPUTE
- KFOLD_PRED_ACTUAL
- KFOLD_N
It seems that we can calulate some ourselves. But right now we also accept all of these values. How do we handle values that we receive but also calculate ourselves? Some options:
I talked with Florian about these today and I would like to share a few thoughts with you. 1) I believe it is important to come to a minimum valuable product / minimum loveable product soon. 2) Loveable includes not perfect ;-). 3) Hackers at Hackathons should have something to do that has the potential to lead to a minimum * product. 4) Take one of each metrics to start: no GT needed, sample size needed, GT needed, e.g. DIV_50(_REL), DIV_PGD, RES_MISS_LOCI. 5) I would opt for "verify them and return a warning if the do not match. However, I can life with any other option you chose too.
I would suggest to go through the list and classify the metrics according to inputs needed for their calculation. for those metrics that can be calculated on the fly via upload the service should compute them and neither expect nor accept user input.
We (@mmaiers-nmdp & @fscheel) reviewed the list of quality metrics and most of the them should be computed by the server and not accepted by the client. The spec does have a place for them in the hfc submission.
We see 4 options for how to deal with quality list values submitted by the client 1.The server can’t accept it because there is no place for it (-) too much work to have two different structures/versions of the code; danger that someone changes one but not the other 2.The server WONT accept it - return error, does not persist data (-) too strict 3.The server will SILENTLY ignore it? persist HF data, but not quality list for qualities that we want the server to compute (+) easy; if someone complains, then change it 4.The server will ignore quality list and return a WARNING (-) nobody will read it; over-engineered
Categorize the list (HH2016) further. Some are just "descriptive statistics" or "features". Some are indicators of how "good" the data is.
At DaSH8 we should implement a few of these using "AWS Lambda"
Simple examples: RES_MISS_LOCI - depends on GT Wn Statistic - global 2-locus pairwise LD (depends on GT) DIV_50_REL - depends on HT only