Try methods for handling noisy labels in neural networks, e.g. some of those discussed in Song et al 2022:
Label smoothing: one-hot label vector --> reduce 1 by alpha and increase 0 to alpha/c where c is the number of classes (Q: why not alpha/(c-1) so that the total is 1?)
Mixup: Linearly interpolate two training instances both in the input space and in the R^c label space (yielding a two-warm vector) (Q: why not randomly pick the number of items to be interpolated from, e.g., an exponential distribution?)
Label refurbishment: mix the noisy gold label and a prediction according to classifier's confidence
Try methods for handling noisy labels in neural networks, e.g. some of those discussed in Song et al 2022: