jorgecarleitao / arrow2

Transmute-free Rust library to work with the Arrow format
Apache License 2.0
1.07k stars 223 forks source link

add computer-reorder feature to allow in-place sorting #1483

Open bguo068 opened 1 year ago

bguo068 commented 1 year ago

If I understand it correctly, multiple-column sorting in arrow2 is done in two steps:

While the method works quite well, it sometimes causes unnecessary memory allocation in step 2. For instance, when references to the columns are unique (not shared), we can do in-place sorting without allocation.

To avoid allocation, maybe we can have a reorder function (within a feature like compute-reorder) to do it. The idea has been implemented for a single slice here (source:

fn sort_by_indices<T>(data: &mut [T], mut indices: Vec<usize>) {
    for idx in 0..data.len() {
        if indices[idx] != idx {
            let mut current_idx = idx;
            loop {
                let target_idx = indices[current_idx];
                indices[current_idx] = current_idx;
                if indices[target_idx] == target_idx {
                    break;
                }
                data.swap(current_idx, target_idx);
                current_idx = target_idx;
            }
        }
    }
}

This idea can be easily expanded to multiple columns or chunks.

Hope this makes sense. If not please comment on how in-place sorting can be done with the current version of arrow2. Thanks!!