interpretml / DiCE

Generate Diverse Counterfactual Explanations for any machine learning model.
https://interpretml.github.io/DiCE/
MIT License
1.32k stars 185 forks source link

Negative dimensions are not allowed #192

Closed Saladino93 closed 2 years ago

Saladino93 commented 2 years ago

I was trying to generate 30 CFs for the adult data set, I get this:

Screenshot 2021-07-27 at 14 21 08

Seems it is related to the random generation (even if I am using the genetic method, probably not enough samples from training)

gaugup commented 2 years ago

@Saladino93, which notebook did you use to generate this error and did you use total_CFs as 30 with genertic explainer?

Regards,

gaugup commented 2 years ago

@soundarya98 and @amit-sharma is seems like dice_genetic cannot generate counterfactuals greater than 25. Looking at this line https://github.com/interpretml/DiCE/blob/004dbd508ea2b801b5f3ca26a87a25eafc19e1f0/dice_ml/explainer_interfaces/dice_genetic.py#L462

rest_members = self.population_size - top_members

The self.population_size is defaulted to 25. Is this something by design?

Regards,

amit-sharma commented 2 years ago

Oh, that's a good point. @soundarya98 can you look into this? Can we set this as a parameter that is accessible to the user and can be changed?

soundarya98 commented 2 years ago

Yup, that can definitely be done. We could set it to 10* total_CFs by default and have the user set it if needed

gaugup commented 2 years ago

@amit-sharma, @soundarya98 I feel this can be still an internal parameter. But we should handle the negative value scenario more gracefully. What if user set a low for population_size and higher value for counterfactuals. The dice genetic code will still fail with similar stack trace.

I propose to not do the computation below line 427 if the rest_members is negative or zero. What do you think?

soundarya98 commented 2 years ago

Yup sounds good! We can have 2 options: default - 10* total_CFs or a user-defined parameter. If the user-defined parameter is smaller than total_CFs, we can raise an exception in the setup function in explainer_base perhaps

gaugup commented 2 years ago

I suppose it doesn't solve the problem entirely. Firstly, we should carefully think about exposing population_size as a reconfigurable parameter. I feel it is an algorithm constant (very specific to dice_genetic.py) so shouldn't be user configurable. Secondly, configuring a large value still doesn't garauntee safety against the crash that is reported by the user here.

What is the issue with continuing without executing the code in ine 463-468 if rest_members is zero or negative? @soundarya98.

new_generation_2  = None
if rest_members > 0:
            new_generation_2 = np.zeros((rest_members, self.data_interface.number_of_features))
            for new_gen_idx in range(rest_members):
                parent1 = random.choice(population[:int(len(population) / 2)])
                parent2 = random.choice(population[:int(len(population) / 2)])
                child = self.mate(parent1, parent2, features_to_vary, query_instance)
                new_generation_2[new_gen_idx] = child

if new_generation_2  is not None:
            if self.total_CFs > 0:
                population = np.concatenate([new_generation_1, new_generation_2])
            else:
                population = new_generation_2
else:
           population  = new_generation_1
amit-sharma commented 2 years ago

Good point about being specific to algorithm, @gaugup. Let's keep it as an internal parameter then. The crash can occur whenever top_members=self.total_CFs is higher than the population_size. So if we initialize population_size as a multiple of total_CFs, then perhaps Soundarya is indicating that this bug should not come up.

Regardless, let's also add the if-else check that you propose. My suggestion would be throw an exception at the last else, since otherwise this may be a silent error and the population variable may never update. The random.choice is needed to add some perturbations to generate new samples. @soundarya98 can you add these changes in PR #205

soundarya98 commented 2 years ago

@amit-sharma

Good point about being specific to algorithm, @gaugup. Let's keep it as an internal parameter then. The crash can occur whenever top_members=self.total_CFs is higher than the population_size. So if we initialize population_size as a multiple of total_CFs, then perhaps Soundarya is indicating that this bug should not come up.

Regardless, let's also add the if-else check that you propose. My suggestion would be throw an exception at the last else, since otherwise this may be a silent error and the population variable may never update. The random.choice is needed to add some perturbations to generate new samples. @soundarya98 can you add these changes in PR #205

@amit-sharma, I hadn't included the check because the error cannot come up unless population_size is lesser than total_CFs. But I can add the check just in case anyway in case someone changes the code in the future!

amit-sharma commented 2 years ago

@Saladino93 this has been fixed through PR #205