tensorflow / tfx

TFX is an end-to-end platform for deploying production ML pipelines
https://tensorflow.github.io/tfx/
Apache License 2.0
2.11k stars 710 forks source link

How to automatically calcuate class weights for imbalanced classes? #1549

Closed lipinski closed 2 years ago

lipinski commented 4 years ago

Hi,

Many real problems have an imbalanced data set. We can use undersampling or class weights, but is there a method to automatically calculate and add class weights to estimators. I know how to add class weights to estimator, but I don't know how to calculate automatically using TFX.

Bumbleblo commented 4 years ago

@lipinski you know/read that documentation ? I believe that you can solve that using just keras / estimator API.

With that method, just use the GenericTrainer to integrate with TFX.

lipinski commented 4 years ago

Yes, I read and know. I don't want to calculate this manually, because imbalance in the data can changes over time. I am looking for solution in TFX. I mean class weights should be calculated based on data.

1025KB commented 4 years ago

Hi, @lipinski, what mentioned by @Bumbleblo is the current solution we supported, in addition to that, StatsGen should help you analysis the input data and decided the class weight, Schema and ExampleValidator can be used to monitoring the data distribution.

@paulgc, is there anything specific for class weight in StatsGen?

ucdmkt commented 4 years ago

@lipinski One approach to dynamically calculate such weight is to dynamically calculate and add an additional feature that indicates class weights into records by Transform (tft.count_per_key would be useful here).

You will need to make sure that such transformation and the weight feature is only added and available at training time, though.

1025KB commented 4 years ago

re @ucdmkt, iiuc, that method is different as the one mentioned in documentation (the class weights is provided at compile time)

paulgc commented 2 years ago

TFDV currently provides top-k frequency counts for categorical features which can help with calculating class weights. But the actual computation of weights should happen outside of TFDV.

gowthamkpr commented 2 years ago

@lipinski Please use Addons component for sampling as mentioned here. Also, closing this issue as its a duplicate of the issue #3831. Thanks!