Closed lipinski closed 2 years ago
@lipinski you know/read that documentation ? I believe that you can solve that using just keras / estimator API.
With that method, just use the GenericTrainer to integrate with TFX.
Yes, I read and know. I don't want to calculate this manually, because imbalance in the data can changes over time. I am looking for solution in TFX. I mean class weights should be calculated based on data.
Hi, @lipinski, what mentioned by @Bumbleblo is the current solution we supported, in addition to that, StatsGen should help you analysis the input data and decided the class weight, Schema and ExampleValidator can be used to monitoring the data distribution.
@paulgc, is there anything specific for class weight in StatsGen?
@lipinski One approach to dynamically calculate such weight is to dynamically calculate and add an additional feature that indicates class weights into records by Transform (tft.count_per_key
would be useful here).
You will need to make sure that such transformation and the weight feature is only added and available at training time, though.
re @ucdmkt, iiuc, that method is different as the one mentioned in documentation (the class weights is provided at compile time)
TFDV currently provides top-k frequency counts for categorical features which can help with calculating class weights. But the actual computation of weights should happen outside of TFDV.
@lipinski Please use Addons component for sampling as mentioned here. Also, closing this issue as its a duplicate of the issue #3831. Thanks!
Hi,
Many real problems have an imbalanced data set. We can use undersampling or class weights, but is there a method to automatically calculate and add class weights to estimators. I know how to add class weights to estimator, but I don't know how to calculate automatically using TFX.