mars-project / mars

Mars is a tensor-based unified framework for large-scale data computation which scales numpy, pandas, scikit-learn and Python functions.
https://mars-project.readthedocs.io
Apache License 2.0
2.7k stars 325 forks source link

Optimize concat #3286

Closed fyrestone closed 1 year ago

fyrestone commented 1 year ago

What do these changes do?

This PR reduces the load of storage service (the object store in Ray backend). For example, with this PR, put data size of TPCH 1GB q05 can be reduced from 3282932389 to 2403252033, about 27%. Rechunk and auto merging are optimized most.

Related issue number

Fixes #xxxx

Check code requirements