ENH: DataFrameNunique has performance issue

Note that the issue tracker is NOT the place for general support. For discussions about development, questions about usage, or any general questions, contact us on https://discuss.xorbits.io/.

I am testing on a dataframe with 3 columns and approximately 400 million rows. The first column of the data contains 85,642,283 distinct values. The performance of xorbits is significantly slower than pandas.

On 256g AWS EC2， pandas spent over 8 minutes to complete caculation including reading csv data, while xorbits took over 10 minutes.

We should introduce shuffle in nunique op for this case.

xorbitsai / xorbits

ENH: DataFrameNunique has performance issue #536