keras-team / tf-keras

The TensorFlow-specific implementation of the Keras API, which was the default Keras from 2019 to 2023.
Apache License 2.0
64 stars 30 forks source link

Keras/tensorflow failed when specifying class_weight in model.fit() #639

Open lauraht opened 2 years ago

lauraht commented 2 years ago

System information.

Describe the problem.

My network model works well without specifying class_weight in model.fit().

However, when I specify class_weight in model.fit(), no matter what weight values I give, keras/tensorflow failed with the following error:

  File "/opt/local/lib/python3.7/site-packages/tensorflow/python/eager/execute.py", line 55, in quick_execute
    inputs, attrs, num_outputs)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Graph execution error:

indices[9] = 16 is not in [0, 10)
         [[{{node GatherV2}}]]
         [[IteratorGetNext]] [Op:__inference_train_function_43205]

Keras/tensorflow failed with the above error even when I give all classes an equal weight 1.0 (which is equivalent to no class weights), as the following (I have 10 classes):

class_weights_dict = {0: 1.0, 1: 1.0, 2: 1.0, 3: 1.0, 4: 1.0, 5: 1.0, 6: 1.0, 7: 1.0, 8: 1.0, 9: 1.0}

history = model.fit(train_input,
                    train_true_labels,
                    class_weight=class_weights_dict,
                    validation_split=validation_split,
                    shuffle=True,
                    epochs=epochs,
                    batch_size=batch_size)

And I verified that my true labels array train_true_labels contains only integers 0-9, as the following:

values = np.unique(train_true_labels)
print(values)
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

However, when I do not specify class_weight in model.fit(), the training for my model works just fine.

So it looks like I just cannot use class_weight in training. But my classes are highly imbalanced; not using class weights would train a useless model.

I would greatly appreciate any solution for this issue.

Thank you very much!

old-school-kid commented 2 years ago

Can you provide code to reproduce the error?

sushreebarsa commented 2 years ago

@lauraht In order to expedite the trouble-shooting process, please provide a code snippet to reproduce the issue reported here. Thanks!

lauraht commented 2 years ago

Hi Surya (@old-school-kid) and @sushreebarsa:

Thank you very much for your help!

Yes, I have made a simplified model that can reproduce the error. Could I email my code and data files to you?

Thank you so much!

lauraht commented 2 years ago

Hi Surya (@old-school-kid) and @sushreebarsa,

I tried to email you my zipped code/data files by directly replying to "keras-team/keras \reply+......@reply.github.com\", however, it complained that it exceeds the size limit and it couldn't be delivered. It is 3.6 MB (zipped). I guess it is the GitHub that has a small size limit.

So I was wondering if I could possibly have your email address so that I could email it to you directly?

Thank you very much!

old-school-kid commented 2 years ago

Hi @lauraht It looks like a classification problem, so can you try to reproduce the error using the MNIST data available in Keras? If it isnt possible please mail it to me (not over github). Thanks

lauraht commented 2 years ago

Hi Surya (@old-school-kid),

I have emailed the code/data files to your gmail account. Both the input and output of the model are sequential data.

Please let me know if you received these code/data files.

Thanks so much for the help!

sushreebarsa commented 2 years ago

@lauraht In order to reproduce the issue, could you please upload all the files on drive and share the link with permission ?Thanks!

lauraht commented 2 years ago

Hi @sushreebarsa,

I have emailed all the files to Surya's (@old-school-kid) gmail account and the email was delivered without the size limit problem.

Do you mean you want me to upload the files to the google drive and share them with you and Surya? If so, could you please let me know your gmail address (I have Surya's gmail address) so that I can share the files with you? Thank you very much!

sushreebarsa commented 2 years ago

@lauraht Could you please share your drive access to the community so that we can access your files as we will not be able to share the gmail details ? Thanks!

lauraht commented 2 years ago

Hi @sushreebarsa and Surya (@old-school-kid),

I have uploaded the code and data files here (in a zipped folder): class_weight_code.zip

The Python file can reproduce this error, and it loads the two data files. Both the input and output of the model are sequential data. The log file contains this error.

However, when I commented out the line “class_weight=class_weights_dict” in “model.fit()”, the model training works just fine.

Thank you very much for the help!

sushreebarsa commented 2 years ago

@lauraht Sorry for the late response! I tried to reproduce this issue on colab using TF v2.8.0 and faced InvalidArgumentError . Could you please have a look at this gist and confirm the same ?Please let me know if I am missing something to reproduce the issue . Please refer to this similar issue and let us know if it helps?Thanks!

lauraht commented 2 years ago

Hi @sushreebarsa,

Thank you very much for looking into this!

I have verified it--- Yes, the error reported at the gist is essentially the same error as I got (which is recorded in my log file).

I looked into this issue as you suggested. I believe the way I defined the class_weight is exactly the same way as the answer described. And I found that a follow-up user there reported the same error (see below) as I encountered:

I tried using class weights with tf data and it gave me weird errors. The shape of my label is [batch_size, seq_len], where each label is between [0, 3], and the shape of the y_pred is [batch_size, seq_len, 4]. Keras complains an invalid arugment error saying index 18 is not in the range of [0, 3]. I don't even know where the 18 comes from. Labels are double-checked to make sure they are all in the range of [0, 3].

So it seems that there is a bug in keras/tensorflow related to the class_weight, which occurs when the output is sequential data (my output is also sequential data).

I would greatly appreciate it if the keras/tensorflow team can fix this problem.

Thank you so much!

lauraht commented 2 years ago

Thank you @sachinprasadhs and @mihirparadkar for looking into this problem!

I would greatly appreciate it if this problem can be fixed, so that our project can continue.

Thank you very much and I am looking forward to a fix!

lauraht commented 2 years ago

Hi @sachinprasadhs and @mihirparadkar,

I was wondering if there is any solution for this problem? I would really appreciate it if there is a solution.

Thank you so much for your help!

lauraht commented 2 years ago

Dear @mihirparadkar and @sachinprasadhs,

I was wondering if the Keras team has any plan to fix this problem?

Could you please give us some update on this? I would really appreciate it!

Our project has been stuck due to this problem, and we are looking forward to a solution.

Thank you so much for your help and we would greatly appreciate a fix to this problem!

iamkashish commented 2 years ago

Hi @mihirparadkar and @fchollet I hope you guys are doing good. Looking forward to a fix, so that class_weight are supported for 3+ dimensional targets.

Famurd2811 commented 1 year ago

@lauraht were you able to find a solution to the class_weight issue?

alekseykandratenko commented 2 months ago

The solution I found to this issue is to convert y_train to a NumPy array, and it works just fine.