xorbitsai / xorbits

Scalable Python DS & ML, in an API compatible & lightning fast way.
https://xorbits.readthedocs.io
Apache License 2.0
1.13k stars 69 forks source link

BUG: ValueError: cannot convert float NaN to integer #591

Closed qinxuye closed 1 year ago

qinxuye commented 1 year ago

Describe the bug

Encountering error: ValueError: cannot convert float NaN to integer

To Reproduce

To help us to reproduce this bug, please provide information below:

  1. Your Python version
  2. The version of Xorbits you use
  3. Versions of crucial packages, such as numpy, scipy and pandas
  4. Full stack of the error.
  5. Minimized code to reproduce the error.

Code

import xorbits.numpy as np

# 按车牌统计同一车辆的总通行次数和总缴费金额
vehicle_pass_count = df.groupby('vehicleId')['payFee'].count()
vehicle_total_fee = df.groupby('vehicleId')['payFee'].sum()

# 划分车辆总缴费金额区间
fee_intervals = [0, 300000, 500000, 1500000, 3000000, 5000000, 10000000, np.inf]
fee_labels = ['0-300000', '300000-500000', '500000-1500000', '1500000-3000000', '3000000-5000000', '5000000-10000000', '10000000+']
vehicle_total_fee_intervals = pd.cut(vehicle_total_fee, bins=fee_intervals, labels=fee_labels, right=False)

# 统计各个区间的车辆总数和总通行次数
summary_by_fee = pd.DataFrame({
    '车辆总数': vehicle_total_fee_intervals.value_counts(),
    '总通行次数': vehicle_pass_count.groupby(vehicle_total_fee_intervals).sum(),
    '通行费金额小计': vehicle_total_fee.groupby(vehicle_total_fee_intervals).sum()
})

Error message:

Traceback (most recent call last):
  File "/usr/local/python3/lib/python3.11/site-packages/pandas/core/dtypes/common.py", line 139, in ensure_python_int
    new_value = int(value)
                ^^^^^^^^^^
ValueError: cannot convert float NaN to integer

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/data/python/study/analysis_year_xorb.py", line 116, in <module>
    summary_by_fee = pd.DataFrame({
                     ^^^^^^^^^^^^^^
  File "/usr/local/python3/lib/python3.11/site-packages/xorbits/pandas/core.py", line 57, in __init__
    mars_entity=MarsDataFrame(*to_mars(args), **to_mars(kwargs)),
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/python3/lib/python3.11/site-packages/xorbits/_mars/dataframe/initializer.py", line 105, in __init__
    df = dataframe_from_1d_tileables(
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/python3/lib/python3.11/site-packages/xorbits/_mars/dataframe/datasource/from_tensor.py", line 557, in dataframe_from_1d_tileables
    return op(d, index, columns, dtypes)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/python3/lib/python3.11/site-packages/xorbits/_mars/core/mode.py", line 78, in _inner
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/python3/lib/python3.11/site-packages/xorbits/_mars/dataframe/datasource/from_tensor.py", line 80, in __call__
    return self._call_input_1d_tileables(input_tensor, index, columns, dtypes)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/python3/lib/python3.11/site-packages/xorbits/_mars/dataframe/datasource/from_tensor.py", line 145, in _call_input_1d_tileables
    self.index = index = pd.RangeIndex(0, tileables[0].shape[0])
                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/python3/lib/python3.11/site-packages/pandas/core/indexes/range.py", line 145, in __new__
    stop = ensure_python_int(stop)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/python3/lib/python3.11/site-packages/pandas/core/dtypes/common.py", line 142, in ensure_python_int
    raise TypeError(f"Wrong type {type(value)} for value {value}") from err
TypeError: Wrong type <class 'float'> for value nan
2023-07-11 09:26:59,121 xorbits._mars.services.cluster.uploader 8080 ERROR    Failed to upload node info
Traceback (most recent call last):
  File "/usr/local/python3/lib/python3.11/site-packages/xorbits/_mars/services/cluster/uploader.py", line 128, in upload_node_info
    await asyncio.to_thread(
  File "/usr/local/python3/lib/python3.11/asyncio/threads.py", line 25, in to_thread
    return await loop.run_in_executor(None, func_call)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "uvloop/loop.pyx", line 2729, in uvloop.loop.Loop.run_in_executor
  File "/usr/local/python3/lib/python3.11/concurrent/futures/thread.py", line 167, in submit
    raise RuntimeError('cannot schedule new futures after shutdown')
RuntimeError: cannot schedule new futures after shutdown
2023-07-11 09:26:59,124 xorbits._mars.services.cluster.uploader 8080 ERROR    Failed to upload node info: cannot schedule new futures after shutdown
2023-07-11 09:26:59,125 xorbits._mars.services.cluster.uploader 8080 ERROR    Failed to upload node info
Traceback (most recent call last):
  File "/usr/local/python3/lib/python3.11/site-packages/xorbits/_mars/services/cluster/uploader.py", line 128, in upload_node_info
    await asyncio.to_thread(
  File "/usr/local/python3/lib/python3.11/asyncio/threads.py", line 25, in to_thread
    return await loop.run_in_executor(None, func_call)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "uvloop/loop.pyx", line 2729, in uvloop.loop.Loop.run_in_executor
  File "/usr/local/python3/lib/python3.11/concurrent/futures/thread.py", line 167, in submit
    raise RuntimeError('cannot schedule new futures after shutdown')
RuntimeError: cannot schedule new futures after shutdown
2023-07-11 09:26:59,125 xorbits._mars.services.cluster.uploader 8080 ERROR    Failed to upload node info: cannot schedule new futures after shutdown
aresnow1 commented 1 year ago

Minimal reproduce:

df = pd.DataFrame({"vehicleId":list("abc"), "payFee":[1,2,3]})
vehicle_pass_count = df.groupby('vehicleId')['payFee'].count()
vehicle_total_fee = df.groupby('vehicleId')['payFee'].sum()
vehicle_total_fee_intervals = pd.cut(vehicle_total_fee, bins=[2, np.inf], labels=["b"], right=False)
print(pd.DataFrame({
    '车辆总数': vehicle_total_fee_intervals.value_counts(),
    '总通行次数': vehicle_pass_count.groupby(vehicle_total_fee_intervals).sum(),
    '通行费金额小计': vehicle_total_fee.groupby(vehicle_total_fee_intervals).sum()
}))