Open GriffinRidgeback opened 5 years ago
I am running into the same issue. The edges problem can be solved by instructing pandas to drop duplicates (add argument duplicates="drop"
to the pd.cut
call in templates/processors/numeric
), but of course it probably means that the problem is in the data itself.
Not sure what the developers could to automatize in this case -- maybe call sklearn Inputer
or (in my case) just fill the NAs?
Thank you! I will give that a try. I did call fillna() on the dataframe before passing the csv to the tool; guess that wasn't enough.
-----Original Message----- From: Tiago Tresoldi notifications@github.com To: minimaxir/automl-gs automl-gs@noreply.github.com Cc: Griffin kevindelia@verizon.net; Author author@noreply.github.com Sent: Thu, Apr 11, 2019 10:48 am Subject: Re: [minimaxir/automl-gs] bin edges must be unique (#23)
I am running into the same issue. The edges problem can be solved by instructing pandas to drop duplicates (add argument duplicates="drop" to the pd.cut call in templates/processors/numeric), but of course it probably means that the problem is in the data itself.Not sure what the developers could to automatize in this case -- maybe call sklearn Inputer or (in my case) just fill the NAs?— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.
Well that fixed it but now I get this error:
ValueError: Error when checking input: expected input_loan_type to have shape (1,) but got array with shape (2,)
when I check this attribute, I get this:
train_data.loan_type.unique()
array([3, 1, 2, 4], dtype=int64)
Should I open a separate ticket for this?
And thank you for getting me a little bit further
I'm having the same issue. Did you solve it?
I did not. I used the xgboost algorithm instead. That ran to completion but I didn't get the output I expected. I thought I would get 1's and 0's but got probabilities instead which wasn't acceptable to what I had to submit for my course project.
Good luck!
@avinregmi Sounds similar to my problem here: #25.
Possible Causes and Solutions Duplicate Values in Data:
Cause: If the data you're binning contains duplicate values, and these duplicates coincide with the bin edges, it can cause this error. Solution: Clean your data to remove or handle duplicates before binning. You can use pandas to drop duplicates or adjust your bin edges slightly to avoid coinciding with duplicate values. Bin Edges Overlap or Too Close:
Cause: If your bin edges are very close to each other, floating-point precision errors might cause them to be treated as non-unique. Solution: Increase the distance between bin edges or use a smaller number of bins. Incorrect Bin Edge Calculation:
Cause: If you're manually calculating bin edges and there's a mistake in the logic, it can result in duplicate edges. Solution: Double-check the logic used to generate bin edges. Use functions like numpy.linspace() to ensure evenly spaced bin edges without duplicates. Floating-Point Precision Issues:
Cause: When bin edges are calculated using floating-point arithmetic, very small differences might not be distinguishable, leading to apparent duplicates. Solution: Round your bin edges to a certain decimal place or use integer-based binning if applicable. Example Soluti
Hello - I am trying to use this package to provide predictions for my Data Science Capstone project. When I run against my training data, I get the following exception/error:
raceback (most recent call last): | 0/20 [00:00<?, ?epoch/s] File "model.py", line 63, in
model_train(df, encoders, args, model)
File "C:\Users\deliak\Documents\Jupyter Notebooks\edX\DAT102x -Microsoft Professional Capstone Data Science\automl_train\pipeline.py", line 903, in model_train
X, y = process_data(df, encoders)
File "C:\Users\deliak\Documents\Jupyter Notebooks\edX\DAT102x -Microsoft Professional Capstone Data Science\automl_train\pipeline.py", line 758, in process_data
df['msa_md'].values, encoders['msa_md_bins'], labels=False, include_lowest=True)
File "C:\Users\deliak\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\reshape\tile.py", line 234, in cut
duplicates=duplicates)
File "C:\Users\deliak\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\reshape\tile.py", line 332, in _bins_to_cuts
"the 'duplicates' kwarg".format(bins=bins))
ValueError: Bin edges must be unique: array([ -1., -1., 18., 63., 118., 192., 247., 305., 329., 371., 408.]).
You can drop duplicate edges by setting the 'duplicates' kwarg
Traceback (most recent call last): | 0/20 [00:00<?, ?epoch/s]
File "c:\users\deliak\appdata\local\continuum\anaconda3\lib\runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "c:\users\deliak\appdata\local\continuum\anaconda3\lib\runpy.py", line 85, in _run_code
exec(code, run_globals)
File "C:\Users\deliak\AppData\Local\Continuum\anaconda3\Scripts\automl_gs.exe__main.py", line 9, in
File "c:\users\deliak\appdata\local\continuum\anaconda3\lib\site-packages\automl_gs\automl_gs.py", line 175, in cmd
tpu_address=args.tpu_address)
File "c:\users\deliak\appdata\local\continuum\anaconda3\lib\site-packages\automl_gs\automl_gs.py", line 87, in automl_grid_search
"metadata", "results.csv"))
File "c:\users\deliak\appdata\local\continuum\anaconda3\lib\site-packages\pandas\io\parsers.py", line 678, in parser_f
return _read(filepath_or_buffer, kwds)
File "c:\users\deliak\appdata\local\continuum\anaconda3\lib\site-packages\pandas\io\parsers.py", line 440, in _read
parser = TextFileReader(filepath_or_buffer, kwds)
File "c:\users\deliak\appdata\local\continuum\anaconda3\lib\site-packages\pandas\io\parsers.py", line 787, in init
self._make_engine(self.engine)
File "c:\users\deliak\appdata\local\continuum\anaconda3\lib\site-packages\pandas\io\parsers.py", line 1014, in _make_engine
self._engine = CParserWrapper(self.f, self.options)
File "c:\users\deliak\appdata\local\continuum\anaconda3\lib\site-packages\pandas\io\parsers.py", line 1708, in init__
self._reader = parsers.TextReader(src, **kwds)
File "pandas_libs\parsers.pyx", line 384, in pandas._libs.parsers.TextReader.cinit
File "pandas_libs\parsers.pyx", line 695, in pandas._libs.parsers.TextReader._setup_parser_source
FileNotFoundError: File b'automl_train\metadata\results.csv' does not exist