Closed shi8tou closed 4 months ago
I can't recreate the issue in Colab, which is running python 3.10.12.
I got an email with a traceback, but it's not here; some wires crossed in github, or you deleted it, or...? That sounded like an issue with the parallelization and maybe not enough memory/space for that, but I'm not an expert about it.
I also couldn't reproduce it on my local linux machine using category-encoders 2.6.0 and python 3.10 in a fresh conda environment. As Ben pointed out for the hashing encoder there are differences with windows when it comes to multi-processing. Are you using windows or Linux/Mac
Thanks. I am using Mac air with M2.
Here is the error I got:
`
Traceback (most recent call last):
File "
This probably means that you are not using fork to start your
child processes and you have forgotten to use the proper idiom
in the main module:
if __name__ == '__main__':
freeze_support()
...
The "freeze_support()" line can be omitted if the program
is not going to be frozen to produce an executable.
Traceback (most recent call last):
File "/test/test_dict.py", line 8, in
Thanks. Notice that the first traceback ends with an error from multiprocessing; the EOF is at the end of a second (identical?) traceback.
You might try a newer version of this package: #428 updated the hashing encoder significantly.
The same error shows up in StackOverflow, but I'm not sure how much it helps: https://stackoverflow.com/q/61931669/10495893
I could get access to an old macbook (still with an intel chip) but also could not reproduce the issue on that machine (using a fresh conda installation). Can you try version 2.6.3 as Ben suggests and see if that solves the issue?
Expected Behavior
HashingEncoders should encode the categorical columns successfully
Actual Behavior
Got EOF error issue while calling HashingEncoders function
Steps to Reproduce the Problem
Packages installed on my laptop: category_encoders==2.6.0 & python==3.10.0
Dataset is here: test_1.csv
Run following code:
dataset = pd.read_csv('test_1.csv') he = ce.HashingEncoder(cols=['purchase_address'], n_components=2)
dd = he.fit_transform(dataset)
dd.columns