Closed glevv closed 2 years ago
You're right. @cmougan this was probably a copy-paste error?
Yes, it's a copy paste issue.
It returns:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 506 entries, 0 to 505
Data columns (total 19 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 CRIM 506 non-null float64
1 ZN 506 non-null float64
2 INDUS 506 non-null float64
3 CHAS 506 non-null float64
4 NOX 506 non-null float64
5 RM 506 non-null float64
6 AGE 506 non-null float64
7 DIS 506 non-null float64
8 RAD 506 non-null float64
9 TAX 506 non-null float64
10 PTRATIO 506 non-null float64
11 B 506 non-null float64
12 LSTAT 506 non-null float64
13 CHAS_25 506 non-null float64
14 RAD_25 506 non-null float64
15 CHAS_50 506 non-null float64
16 RAD_50 506 non-null float64
17 CHAS_75 506 non-null float64
18 RAD_75 506 non-null float64
Currently you can't use Summary Encoder or Quantile Encoder because they are not yet released.
While there is not a new update of category_encoders
package you can use the implementation that we use on the original paper in pip install sktools
@PaulWestenthanner maybe we could do a package release?
We definitely should release. Unfortunately I do not have the rights to do so...
@PaulWestenthanner who does?
@PaulWestenthanner you should have rights. If you update the version in init.py and the changelog, then go into the releases page of github and draft a new release (tag it with the release number) then the github action should take care of the rest.
Ah, I didn't know that. Sorry that I postponed the release for so long. It worked like charm though. The new version is visible in PyPI. Thanks a lot @wdm0006 !
Expected Behavior
SummaryEncoder should return N*cat_features columns, where N is the number of quantiles used to describe each category, at least this is stated in the original paper section 2.1
Actual Behavior
Docs example states that SummaryEncoder returns 1*cat_features