Open cnmoro opened 3 years ago
yep, me too. @cnmoro did you find any work-arounds for this? @youbao88 please help!
@thearcanist I remember solving It, but not how. It was still way too slow for my use, so I ended up using regular kmeans, and applying MCA (Prince package) to the categorical variables, and also normalizing the numerical ones with minmaxScaler (based on the min Max Values from MCA)
It has something to do with the gamma. Once you specify gamma yourself the problem is solved.
Sorry for such a late replay. The problem seems like there is something conflict with the nopython mode of numba. I will try to fix this in the next release. However, at the same time, could you please verify if it could be solved by using @ori-katz100 's suggestion?
@thearcanist @ori-katz100 @youbao88
The fix was to calculate the gamma value instead of passing it as None.
I modified the original mean_std function and ended up with the following code:
categorical = [1 if x in categorics else 0 for x in data.columns]
def mean_std(data, types):
std = 0
count_num_column = 0
for col_index in range(len(types)):
if types[col_index] == 0:
count_num_column += 1
std += np.std(data.iloc[:,col_index])
return std/count_num_column
custom_gamma = mean_std(data, np.array(categorical))
then pass "custom_gamma" as the gamma parameter. the mean_std function worked after I added the ".iloc" function to the np.std calc, otherwise it was throwing some error related to slices of the data
KPrototypes_plus(n_clusters=k, n_init = 16, n_jobs = -1, gamma = custom_gamma)
It is still way too slow for me, I have a dataset with 547930 rows, 4 numerics columns and 7 categorical columns. It takes more than two hours to run the model with n_clusters=2 I can't even plot the elbow curve ( which would require running from k=2 to k=15 ) :(
Thank you @cnmoro for your kindly reply.
Yes, I have now noticed this issue and it would be fixed in the next small release.
Regarding the performance issue, it would be improved with the next big release.
Thank you again for your comments.
@cnmoro can you verify if it fixes in the new release v0.0.3 Thank you!
@cnmoro can you verify if it fixes in the new release v0.0.3 Thank you!
unfortunately i still have the error in 0.0.3
TypingError: Failed in nopython mode pipeline (step: nopython frontend) non-precise type array(pyobject, 2d, F)
167: def mean_std(data, types): std = 0 ^
I am having the following issue (both python3.6 and 3.8). Any ideas on how to fix this? Thanks !
Failed in nopython mode pipeline (step: nopython frontend) non-precise type array(pyobject, 2d, F) During: typing of argument at /home/cnmoro/miniconda3/envs/py36/lib/python3.6/site-packages/kpplus/kpplus.py (167)
File "../../miniconda3/envs/py36/lib/python3.6/site-packages/kpplus/kpplus.py", line 167: def mean_std(data, types): std = 0 ^