vaexio / vaex

Out-of-Core hybrid Apache Arrow/NumPy DataFrame for Python, ML, visualization and exploration of big tabular data at a billion rows per second 🚀
https://vaex.io
MIT License
8.28k stars 590 forks source link

When using groupby, change the type to distort the value. #2136

Closed JovanVeljanoski closed 2 years ago

JovanVeljanoski commented 2 years ago

Discussed in https://github.com/vaexio/vaex/discussions/2131

Originally posted by **tommyhj217** July 26, 2022 Hi DataFrame delay_time has values from -276 to 111 and dtype=int64. ![image](https://user-images.githubusercontent.com/86287388/180895338-8fc2bcb1-2405-43b1-8496-5a42386a0332.png) The number of values is 206. ![image](https://user-images.githubusercontent.com/86287388/180896352-874b91d6-e21e-49d9-8f84-eb97f40666d4.png) df_delaytime_data = vdf_2.groupby(['delay_time'], agg={'cnt':'count'}) ![image](https://user-images.githubusercontent.com/86287388/180895537-c803424e-3c16-42af-88cc-ad5db21a7087.png) However, if you do groupby, the dtype of the column is changed to int8, and the value of -128 or higher is changed to positive. Is this a problem with groupby? How can we solve it? Thanks Hyunjun