quinnwencn / blog

Apache License 2.0
0 stars 0 forks source link

[RustLearning] iterator的使用 #35

Open quinnwencn opened 1 month ago

quinnwencn commented 1 month ago

Rust中的迭代器功能非常強大,但是掌握並熟悉使用迭代器是一個漫長的過程,尤其是對熟悉了C++再學習的人,會經常以C++的方式寫一些循環代碼。這裏我藉著Exercism上的一道題,來熟悉迭代器的使用,後續如果有新的迭代器不熟悉的,也在這個issue上補充。迭代器的官方文檔也是非常有用的手冊! 這次的Exorcism的題目是Luhn, 判斷一串數字是否滿足Luhn算法的要求,銀行卡、社保號碼等都滿足Luhn算法的要求。 Luhn算法可以參考維基百科的介紹,要求簡單概括為以下:

  1. 只包含數字和空格字符;
  2. 數字字符總署要大於1;
  3. 逆序遍歷數字字符,偶數位的數字字符double,如果double後結果大於9,則要減去9;
  4. 將逆序遍歷後的數字求和,和的結果如果是10的倍數則合法,否則不合法。
quinnwencn commented 1 month ago

我的第一版解法是:

/// Check a Luhn checksum.
pub fn is_valid(code: &str) -> bool {
    if !code.chars().all(|c| c.is_ascii_digit() || c.is_whitespace()) {
        return false;
    }

    let mut count = 0;
    let mut sum: u32 = 0;
    code.chars()
        .rev()
        .filter(|x| x.is_ascii_digit())
        .for_each(|x| {
            count += 1;
            if count % 2 != 0 {
                sum += x.to_digit(10).unwrap();
            } else {
                let val = x.to_digit(10).unwrap() * 2;
                match val.cmp(&10) {
                    std::cmp::Ordering::Less => sum += val,
                    _ => sum += val - 9,
                };
            }
        });
    count > 1 && sum % 10 == 0
}

這雖然能解決問題,但是存在兩次迭代;此外,還額外控制了count和sum這兩個變量的更新,其實這都可以通過迭代器來完成,我們看看社區內其他人的優秀解法:

/// Check a Luhn checksum.
pub fn is_valid(code: &str) -> bool {
    code.chars()
        .rev()
        .filter(|c| !c.is_whitespace())
        .try_fold((0, 0), |(sum, count), val| {
            val.to_digit(10)
                .map(|num| if count % 2 == 1 { num * 2 } else { num })
                .map(|num| if num > 9 { num - 9 } else { num })
                .map(|num| (num + sum, count + 1))
        }).map_or(false, |(sum, count)| sum % 10 == 0 && count > 1)
}

這個解法來自[JaneL's solution,這裏主要用了try_fold和map_or來解決我的解法中兩次迭代的問題。try_fold實際上是fold的變種,fold始終成功,而try_fold則可能會失敗,因此要配合map_or來使用。 我們先來看fold的定義:

fn fold<B, F>(self, init: B, f: F) -> B where Self: Sized, F: FnMut(B, Self::Item) -> B, Folds every element into an accumulator by applying an operation, returning the final result. fold() takes two arguments: an initial value, and a closure with two arguments: an ‘accumulator’, and an element. The closure returns the value that the accumulator should have for the next iteration. The initial value is the value the accumulator will have on the first call. After applying this closure to every element of the iterator, fold() returns the accumulator.

fold的輸入是兩個參數:

  1. 初始值
  2. 閉包,閉包是兩個參數,第一個是累加的值,第二個是處理的入參 比如:
    
    let a = [1, 2, 3];

// the sum of all of the elements of the array let sum = a.iter().fold(0, |acc, x| acc + x);

assert_eq!(sum, 6);

try_fold就是比fold多了一個失敗機制:
> fn [try_fold](https://doc.rust-lang.org/std/iter/trait.Iterator.html#method.try_fold)<B, F, R>(&mut self, init: B, f: F) -> R
where
    Self: [Sized](https://doc.rust-lang.org/std/marker/trait.Sized.html),
    F: [FnMut](https://doc.rust-lang.org/std/ops/trait.FnMut.html)(B, Self::[Item](https://doc.rust-lang.org/std/iter/trait.Iterator.html#associatedtype.Item)) -> R,
    R: [Try](https://doc.rust-lang.org/std/ops/trait.Try.html)<Output = B>,
An iterator method that applies a function as long as it returns successfully, producing a single, final value.
try_fold() takes two arguments: an initial value, and a closure with two arguments: an ‘accumulator’, and an element. The closure either returns successfully, with the value that the accumulator should have for the next iteration, or it returns failure, with an error value that is propagated back to the caller immediately (short-circuiting).

因為try_fold返回的是一個Option,因此如果沒有迭代器處理的話,就需要用Some來取值:

let a = [1, 2, 3];

// the checked sum of all of the elements of the array let sum = a.iter().try_fold(0i8, |acc, &x| acc.checked_add(x));

assert_eq!(sum, Some(6));


如果用迭代器,則需要用帶有or的迭代器,表示可能會失敗。題解中的map_or就是這麼用的:
> pub fn [map_or](https://doc.rust-lang.org/std/option/enum.Option.html#method.map_or)<U, F>(self, default: U, f: F) -> U
where
    F: [FnOnce](https://doc.rust-lang.org/std/ops/trait.FnOnce.html)(T) -> U,
Returns the provided default result (if none), or applies a function to the contained value (if any).
Arguments passed to map_or are eagerly evaluated; if you are passing the result of a function call, it is recommended to use [map_or_else](https://doc.rust-lang.org/std/option/enum.Option.html#method.map_or_else), which is lazily evaluated.

第一個參數是失敗時返回的值,第二個閉包則是成功時需要進行的處理。
題解中,如果失敗,代表有字符無法轉換成digit,也就是非數字字符,因此直接返回false,如果是數字字符,則根據Luhn的算法進行處理。
不得不說,這個解法真的很棒!
quinnwencn commented 1 month ago

在這裡輸入要轉換的內容迭代器的參數雖然是閉包,但是也可以傳遞函數,比如Exercism上的這道Beer Song。求解時我的解法是使用for循環:

pub fn verse(n: u32) -> String {
    match n {
        0 => "No more bottles of beer on the wall, no more bottles of beer.\nGo to the store and buy some more, 99 bottles of beer on the wall.\n".to_string(),
        1 => "1 bottle of beer on the wall, 1 bottle of beer.\nTake it down and pass it around, no more bottles of beer on the wall.\n".to_string(),
        2 => "2 bottles of beer on the wall, 2 bottles of beer.\nTake one down and pass it around, 1 bottle of beer on the wall.\n".to_string(),
        _ => format!("{} bottles of beer on the wall, {} bottles of beer.\nTake one down and pass it around, {} bottles of beer on the wall.\n", n, n, n-1),
    }
}

pub fn sing(start: u32, end: u32) -> String {
    let mut song = String::new();
    for i in (end..=start).rev() {
        let lyric = verse(i);
        song.push_str(&lyric);
        if i != end {
            song.push(''\n'');
        }
    }

    song
}

但是,也可以使用迭代器來求解。我的解法中,for循環用了rev來倒序,因爲for循環不能像C++一樣倒序遍曆。使用迭代器的話,就更簡潔了:

pub fn verse(n: u32) -> String {
    match n {
        0 => "No more bottles of beer on the wall, no more bottles of beer.\nGo to the store and buy some more, 99 bottles of beer on the wall.\n".to_string(),
        1 => "1 bottle of beer on the wall, 1 bottle of beer.\nTake it down and pass it around, no more bottles of beer on the wall.\n".to_string(),
        2 => "2 bottles of beer on the wall, 2 bottles of beer.\nTake one down and pass it around, 1 bottle of beer on the wall.\n".to_string(),
        _ => format!("{} bottles of beer on the wall, {} bottles of beer.\nTake one down and pass it around, {} bottles of beer on the wall.\n", n, n, n-1),
    }
}

pub fn sing(start: u32, end: u32) -> String {
    (end..=start).rev()
        .map(verse)
        .collect::<Vec<_>>()
        .join("\n")
}

這裡值得一提的是map我們沒有傳遞閉包,而是傳遞一個函數,這也是可行的;另一個是String可以由Vec通過join來生成。

quinnwencn commented 3 weeks ago

有两个迭代器,需要注意下:flat_mapmap,他们的作用是一致的,都是对值进行处理,然后返回一个迭代器。但是略有区别,我们看Rust的定义:

fn flat_map<U, F>(self, f: F) -> FlatMap<Self, U, F> where Self: Sized, U: IntoIterator, F: FnMut(Self::Item) -> U, Creates an iterator that works like map, but flattens nested structure.

The map adapter is very useful, but only when the closure argument produces values. If it produces an iterator instead, there’s an extra layer of indirection. flat_map() will remove this extra layer on its own. You can think of flat_map(f) as the semantic equivalent of mapping, and then flattening as in map(f).flatten(). Another way of thinking about flat_map(): map’s closure returns one item for each element, and flat_map()’s closure returns an iterator for each element.

map的输入是一个值,经过闭包处理后,会返回一个Map结构,其实也就是新的迭代器; flat_map也是接收一个值,经过闭包处理后,返回一个FlatMap结果迭代器,也就是将结果摊平,即如果结构是一个迭代器,会先取出值,再构成一个迭代器,以确保后续迭代器可以使用。 空讲理论未免过于枯燥,我们来看下Exercism的Kindergarten Garden的另一种解法:

pub fn plants(diagram: &str, student: &str) -> Vec<&'static str> {
    let mut index = student.as_bytes()[0];
    if index < 'A' as u8 || index > 'L' as u8 {
        panic!("Unknown student name");
    }

    index -= 'A' as u8;
    let ind = index as usize * 2;
    diagram.lines()
        .flat_map(|line| {
            line[ind..= ind + 1].chars()
                .map(|c| match c {
                    'G' => "grass",
                    'C' => "clover",
                    'R' => "radishes",
                    'V' => "violets",
                    _ => panic!("Invalid plants"),
                })
        })
        .collect()
}

[package]
name = "kindergarten_garden"
version = "0.1.0"
edition = "2021"

# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html

[dependencies]

可以看到,迭代器处理里,先将diagram分成多行,然后每行都应用于一个flat_map迭代器,闭包是一个map操作,map里面将字符对应的str包成一个迭代器返回,然后flat_map将其摊平,再以迭代器方式返回,以便collect收集成Vec<&str>. 如果我们不使用flat_map,而是使用map,我们看下报错:

.collect()
   |          ^^^^^^^ value of type `Vec<&str>` cannot be built from `std::iter::Iterator<Item=Map<Chars<'_>, {closure@src/lib.rs:12:22: 12:25}>>`
   |
   = help: the trait `FromIterator<Map<Chars<'_>, {closure@src/lib.rs:12:22: 12:25}>>` is not implemented for `Vec<&str>`
   = help: the trait `FromIterator<&str>` is implemented for `Vec<&str>`
   = help: for that trait implementation, expected `&str`, found `Map<Chars<'_>, {closure@src/lib.rs:12:22: 12:25}>`
note: the method call chain might not have had the expected associated types

由于map会将结果包成一个迭代器,也就是说外层map将底层返回的Map继续包成一个迭代器,导致后续无法继续使用;与之相反,flat_map会将里层返回的结果Map摊平成char,再用迭代器包裹形成Iterator,因此可以继续使用迭代器操作。