Closed 0x1orz closed 5 years ago
Yes, the uncertainty in the measurement values can be quite high. In the current version we train directly on all the peptides, duplicates and all. When computing validation accuracy on held out data this requires being careful to remove any peptides in both train and test sets.
In earlier versions we've tried grouping by peptide and taking geometric mean or median, but anecdotally I haven't seen that make a big difference.
The curated dataset has many the duplicate , consisting of more than half . the peptides have some difference measurement_values.