Closed Sandy4321 closed 1 year ago
Hi, can you please clarify your question?
when data is unbalanced - meaning some labels count is much bigger than another labels count for example YES labels count is 123 but NO label count is 9876543 so overall we do hav 123 + 9876543 = 987666 samples (rows) then prediction algorithm should be designed with special treatment to get high value for F1 score for details pls refer to https://machinelearningmastery.com/xgboost-for-imbalanced-classification/
to sum up I do not see in your paper how this unbalanced data issue is addressed but hopefully in any case you do have proper unbalanced data treatment
YDF supports example weights, which allows the user to perform re-weighting of the training examples through all the methods explained in the article. The weights can be set manually or through a mapping. See the WeightDefinition proto for details
I see // "LinkedWeightDefinition" is a pre-processed version of "WeightDefinition"
does it means weights calculated from labels ratio automatically (not only manually as follows from your answer (The weights can be set manually or through a mapping) ?
may you help to understand how unbalanced data treaded in your code? https://arxiv.org/pdf/2212.02934.pdf