rust-itertools / itertools

Extra iterator adaptors, iterator methods, free functions, and macros.
https://docs.rs/itertools/
Apache License 2.0
2.64k stars 299 forks source link

Composability of the type returned from the `chunks` method #896

Closed freak82 closed 4 months ago

freak82 commented 4 months ago

Hi there,

First of all, feel free to close this issue if you think it is not appropriate. I mean, that I have no real problem with the library I'm just struggling with a case where I want to translate given C++ ranges code to Rust using iterators and adapters. Additional "requirement" is the code to do as few allocations as possible.

So basically, I've a string which represents big hex number and it should be split by comma at every 8 character from right to left. For example, this fff1fffffff2fffffffe should become this fff1,fffffff2,fffffffe (this format is required by /sys/class/net/<interface>/queues/rx-<num>/rps_cpus).

The C++ code using ranges looks like this and do single allocation only for the final string because it can calculate it's size from the passed iterators:

auto fin_flags = irq_flags                                            
                        | stdv::reverse                                      
                        | stdv::chunk(8)                                     
                        | stdv::join_with(',')                               
                        | stdr::to<std::string>();                           
    stdr::reverse(fin_flags); 

I'm trying to achieve similarly looking and performing code using Rust iterators and adapters. However, my experience with Rust is pretty basic at this point and I'm not satisfied with the result of my "translation". The Rust code looks a bit "clumsy" due to the "manual" extension and join operations in the fold closure.

let mut fmask = mask                                                           
        .into_iter()                                                               
        .rev()                                                                  
        .chunks(8)                                                              
        .into_iter()                                                                                                                            
        .fold(Vec::<char>::new(), |mut acc, add| {                              
            acc.extend(add.into_iter());                                        
            acc.push(',');                                                      
            acc                                                                 
        });                                                                     
    fmask.pop(); // The last comma                                              
    fmask.reverse();

I've troubles with the type returned from the chuks function. I couldn't make it work with intersperse which I was trying to use to add the ,. I've tried flattening it but this created other Vec objects i.e. it led to more memory allocations. Also note that the C++ code uses std::string while the Rust code uses Vec<char just because I couldn't find a way to reverse in place String otherwise the final reverse would require additional memory allocation. AFAIK, the Rust String is utf8 and that's why there is no way to reverse it without additional memory allocation.

So, is there a way to make the Rust code look better and at the same time to decrease the number of allocations (currently it does 16-32-64-128 byte allocations, I guess when enlarging the final Vec. I guess, I could use some other adapters from the itertools but I couldn't figure out which.

Both C++ and Rust code is available in this godbolt link.

Thanks, Pavel.

Philippe-Cholet commented 4 months ago

I think you need slice.rchunks. Slices know the entire data and length of it while our iterator adaptors don't know much about the underlying data.

use itertools::Itertools; // for intersperse

fn f(data: &str) -> String {
    let b = data.as_bytes();
    let it = b.rchunks(8).rev();
    let nb_commas = it.len().saturating_sub(1);
    let total_length = b.len() + nb_commas;
    let mut result = String::with_capacity(total_length); // Single heap allocation.
    result.extend(
        it.intersperse(",".as_bytes() /* or just: &[b','] */)
            .flatten()
            .map(|x| *x as char),
    );
    result
    // or
    // let mut vec = Vec::with_capacity(total_length); // Single heap allocation.
    // vec.extend(it.intersperse(",".as_bytes()).flatten().copied()); // copied: &u8 -> u8
    // String::from_utf8(vec).unwrap() // Explicitely panics if there is an utf8 issue.
}

fn main() {
    let s = f("fff1fffffff2fffffffe");
    println!("{:?}", (s.len(), s.capacity(), s));
}
freak82 commented 4 months ago

Aha, I see. Thank you for the response.