Closed Saladino93 closed 2 years ago
@Saladino93, which notebook did you use to generate this error and did you use total_CFs as 30 with genertic explainer?
Regards,
@soundarya98 and @amit-sharma is seems like dice_genetic cannot generate counterfactuals greater than 25. Looking at this line https://github.com/interpretml/DiCE/blob/004dbd508ea2b801b5f3ca26a87a25eafc19e1f0/dice_ml/explainer_interfaces/dice_genetic.py#L462
rest_members = self.population_size - top_members
The self.population_size is defaulted to 25. Is this something by design?
Regards,
Oh, that's a good point. @soundarya98 can you look into this? Can we set this as a parameter that is accessible to the user and can be changed?
Yup, that can definitely be done. We could set it to 10* total_CFs by default and have the user set it if needed
@amit-sharma, @soundarya98 I feel this can be still an internal parameter. But we should handle the negative value scenario more gracefully. What if user set a low for population_size and higher value for counterfactuals. The dice genetic code will still fail with similar stack trace.
I propose to not do the computation below line 427 if the rest_members is negative or zero. What do you think?
Yup sounds good! We can have 2 options: default - 10* total_CFs or a user-defined parameter. If the user-defined parameter is smaller than total_CFs, we can raise an exception in the setup function in explainer_base perhaps
I suppose it doesn't solve the problem entirely. Firstly, we should carefully think about exposing population_size as a reconfigurable parameter. I feel it is an algorithm constant (very specific to dice_genetic.py) so shouldn't be user configurable. Secondly, configuring a large value still doesn't garauntee safety against the crash that is reported by the user here.
What is the issue with continuing without executing the code in ine 463-468 if rest_members is zero or negative? @soundarya98.
new_generation_2 = None
if rest_members > 0:
new_generation_2 = np.zeros((rest_members, self.data_interface.number_of_features))
for new_gen_idx in range(rest_members):
parent1 = random.choice(population[:int(len(population) / 2)])
parent2 = random.choice(population[:int(len(population) / 2)])
child = self.mate(parent1, parent2, features_to_vary, query_instance)
new_generation_2[new_gen_idx] = child
if new_generation_2 is not None:
if self.total_CFs > 0:
population = np.concatenate([new_generation_1, new_generation_2])
else:
population = new_generation_2
else:
population = new_generation_1
Good point about being specific to algorithm, @gaugup. Let's keep it as an internal parameter then.
The crash can occur whenever top_members=self.total_CFs
is higher than the population_size
. So if we initialize population_size as a multiple of total_CFs, then perhaps Soundarya is indicating that this bug should not come up.
Regardless, let's also add the if-else check that you propose. My suggestion would be throw an exception at the last else
, since otherwise this may be a silent error and the population variable may never update. The random.choice is needed to add some perturbations to generate new samples.
@soundarya98 can you add these changes in PR #205
@amit-sharma
Good point about being specific to algorithm, @gaugup. Let's keep it as an internal parameter then. The crash can occur whenever
top_members=self.total_CFs
is higher than thepopulation_size
. So if we initialize population_size as a multiple of total_CFs, then perhaps Soundarya is indicating that this bug should not come up.Regardless, let's also add the if-else check that you propose. My suggestion would be throw an exception at the last
else
, since otherwise this may be a silent error and the population variable may never update. The random.choice is needed to add some perturbations to generate new samples. @soundarya98 can you add these changes in PR #205
@amit-sharma, I hadn't included the check because the error cannot come up unless population_size is lesser than total_CFs. But I can add the check just in case anyway in case someone changes the code in the future!
@Saladino93 this has been fixed through PR #205
I was trying to generate 30 CFs for the adult data set, I get this:
Seems it is related to the random generation (even if I am using the genetic method, probably not enough samples from training)