iancovert / sage

For calculating global feature importance using Shapley values.
MIT License
255 stars 34 forks source link

Exception encountered when calling layer "gru" (type GRU). #14

Closed deepakupman closed 8 months ago

deepakupman commented 2 years ago

I am getting the below error when trying to use it with text data with GRU Layer.

InternalError: Exception encountered when calling layer "gru" (type GRU). Failed to call ThenRnnForward with model config: [rnn_mode, rnn_input_mode, rnn_direction_mode]: 3, 0, 0 , [num_layers, input_size, num_units, dir_count, max_seq_length, batch_size, cell_num_units]: [1, 64, 64, 1, 32, 250000, 0] [Op:CudnnRNN]

Call arguments received: • inputs=tf.Tensor(shape=(250000, 32, 64), dtype=float32) • mask=None • training=False • initial_state=None

Model: Model: "sequential"

Layer (type) Output Shape Param #
embedding (Embedding) (None, 32, 64) 768000
spatial_dropout1d (SpatialD ropout1D) (None, 32, 64) 0
gru (GRU) (None, 32, 64) 24960
dropout (Dropout) (None, 32, 64) 0
gru_1 (GRU) (None, 64) 24960
dropout_1 (Dropout) (None, 64) 0
dense (Dense) (None, 32) 2080
dropout_2 (Dropout) (None, 32) 0
dense_1 (Dense) (None, 100) 3300
dense_2 (Dense) (None, 1) 101

Total params: 823,401 Trainable params: 823,401 Non-trainable params: 0

iancovert commented 2 years ago

Hi Deepak, it looks like the issue is that the package isn't correctly handling held-out features and making predictions with your model. This is one of the core operations when calculating SAGE values, and I wrote the package to work mainly with tabular data where the model input is size (batch, num_features). So it's just not currently set up for your use-case, but we should be able to make it work here.

The main thing we need to figure out is the feature imputer. Since you're working with embeddings, it may be simplest to impute held-out feature values with zeros (and this seems reasonable given that you're already training with 1d dropout in the second layer). The package's way of doing this is implemented in the DefaultImputer class (here), but a couple possible issues jump out to me. First, can I ask which dimensions you want to consider as features? I'm guessing you want the dimension 32 to be features because the 64 dimension looks like the embedding size - is that right? Let me know and I can help write a corrected imputer class.

Also, can I ask what kind of data you're using with a GRU where you want to understand global rather than local feature importance?