Open visze opened 2 years ago
well they are a bit better:
new positives:
metric value
AUROC 0.995
AUPRC 0.605
old positives:
metric value
AUROC 0.996
AUPRC 0.585
But I have to rerun everyting 100 times to see the average increase.
After rerunning parsmurf 100 times with different seeds I get on hg38, with global means:
additional positives:
metric mean max min
AUROC 0.99501 0.996 0.995
AUPRC 0.59688 0.609 0.578
standard positives:
metric mean max min
AUROC 0.9959 0.996 0.995
AUPRC 0.58186 0.598 0.563
So we get a slight increase in AUPRC of 0.01502 and a small decrease of AUROC of -0.00089
I don't think it is worth to include the new data, because of the small increase (and we tuning on imbalance, so better AUPRC is somehow expected). In theory a different test set (not crossvalidation) is needed to really show if this helps.
Using global means for some features works much better than new positives (see #12)
redo the same for feature set of remm v1.4 and with both genome releases
For 100 repetitions with random seeds
positives | Metric | Mean | Max | Min |
---|---|---|---|---|
standard | AUPRC | 0.599 | 0.615 | 0.584 |
AUROC | 0.996 | 0.996 | 0.995 | |
additional | AUPRC | 0.609 | 0.62 | 0.595 |
AUROC | 0.995 | 0.996 | 0.995 |
TODO
Just see if there will be an improvement. This will not be part of the manuscript