Open davidhopkinson26 opened 6 days ago
The way it is now relies on the calls to init and fit methods in OneHotEncoder setting attributes in the OneHotEncodeingTransformer. This is a very unsafe way to do things (as we have just found out!) but has probably been working sort of ok up to now. Not quite sure why the call to init is not setting the '_drop_idx_after_grouping' attribute but still expect that swithcing to initialising and persisting a OneHotEncoder object will be a safer way for this to work.
What happened?
When trying to transform using this transformer we get an error
AttributeError: 'OneHotEncodingTransformer' object has no attribute '_drop_idx_after_grouping'
This attribute was added in sklearn> 1.2.1 and is set in init for OneHotEncoder. This should be fine but in OneHotEncoderTransformer it looks like we do not actually persist the initialised object, instead calling
in init
in fit
OneHotEncoder.fit(self, X=X[self.columns], y=y)
and in transform:
X_transformed = OneHotEncoder.transform(self, X[self.columns])
So the intialisation in this classes init method does not actually persist to he object called in transform. Up to now this has run as just calling the transform method without any initialisation worked ok (however it may not have been functioning as expected!). However with the newer versions of sklearn the method looks for an attribute which does not exist (a) because it is called within the wrong class so self directs it to OneHotEncoderTransformer and (b) because the OneHotEncoder has not been initialised.
To fix this we should assign an instance of OneHotEncoder to an attribute of OneHotEncoderTransformer, say '_encoder', and persist this through init, fit and transform.
Environment
sklearn 1.5.2 tubular 1.3.1
Minimum reproducible code
Relevant error output
Code of Conduct