Open jaidevd opened 3 years ago
Assuming we stick with pandas for reading the csv file. Supporting read_csv(..., dtype={}, ...)
should allow users to be more explicit -- on how to infer types?
@pratapvardhan We currently don't have a way of letting Gramex users specify dtypes from the yaml or requests. We should, eventually.
Is something not working as expected? Because MLHandler accepts feature values through URLs, they have to be coerced into the correct types. This can be too restrictive, because the types are inferred from the dataframes that are cached during training. Especially, if a dataframe has a feature which is an integer, MLHandler won't allow it to have a value that is a float.
Steps to reproduce. Please help us reproduce the bug, by sharing:
Paste this in a file named
xor.csv
:Use the following gramex config:
ERROR 26-Feb 20:11:37 web Uncaught exception GET /?x=1.5&y=1.5 (::1) HTTPServerRequest(protocol='http', host='localhost:9988', method='GET', uri='/?x=1.5&y=1.5', version='HTTP/1.1', remote_ip='::1') Traceback (most recent call last): File "/home/jaidevd/anaconda3/lib/python3.7/site-packages/tornado/web.py", line 1592, in _execute result = yield result File "/home/jaidevd/anaconda3/lib/python3.7/site-packages/tornado/gen.py", line 1133, in run value = future.result() File "/home/jaidevd/anaconda3/lib/python3.7/site-packages/tornado/gen.py", line 1141, in run yielded = self.gen.throw(exc_info) File "/home/jaidevd/src/gramex/gramex/handlers/mlhandler.py", line 466, in get self._predict, to_predict) File "/home/jaidevd/anaconda3/lib/python3.7/site-packages/tornado/gen.py", line 1133, in run value = future.result() File "/home/jaidevd/anaconda3/lib/python3.7/concurrent/futures/_base.py", line 428, in result return self.get_result() File "/home/jaidevd/anaconda3/lib/python3.7/concurrent/futures/_base.py", line 384, in get_result raise self._exception File "/home/jaidevd/anaconda3/lib/python3.7/concurrent/futures/thread.py", line 57, in run result = self.fn(self.args, self.kwargs) File "/home/jaidevd/src/gramex/gramex/handlers/mlhandler.py", line 336, in _predict data = self._transform(data, deduplicate=False) File "/home/jaidevd/src/gramex/gramex/handlers/mlhandler.py", line 308, in _transform data[col] = data[col].astype(orgdata[col].dtype) File "/home/jaidevd/anaconda3/lib/python3.7/site-packages/pandas/core/generic.py", line 5882, in astype dtype=dtype, copy=copy, errors=errors, kwargs File "/home/jaidevd/anaconda3/lib/python3.7/site-packages/pandas/core/internals/managers.py", line 581, in astype return self.apply("astype", dtype=dtype, kwargs) File "/home/jaidevd/anaconda3/lib/python3.7/site-packages/pandas/core/internals/managers.py", line 438, in apply applied = getattr(b, f)(kwargs) File "/home/jaidevd/anaconda3/lib/python3.7/site-packages/pandas/core/internals/blocks.py", line 559, in astype return self._astype(dtype, copy=copy, errors=errors, values=values, kwargs) File "/home/jaidevd/anaconda3/lib/python3.7/site-packages/pandas/core/internals/blocks.py", line 643, in _astype values = astype_nansafe(vals1d, dtype, copy=True, kwargs) File "/home/jaidevd/anaconda3/lib/python3.7/site-packages/pandas/core/dtypes/cast.py", line 707, in astype_nansafe return lib.astype_intsafe(arr.ravel(), dtype).reshape(arr.shape) File "pandas/_libs/lib.pyx", line 547, in pandas._libs.lib.astype_intsafe ValueError: invalid literal for int() with base 10: '1.5'