rust-bakery / nom

Rust parser combinator framework
MIT License
9.18k stars 792 forks source link

take_while_m_n splits at wrong index #1642

Closed ackxolotl closed 1 year ago

ackxolotl commented 1 year ago

take_while_m_n splits a string slice with multibyte characters (e.g., arab letters) at the wrong position. input.position(condition) returns the byte to split at (6 in the following example case) while input.splice_index(idx) takes that byte position as char index and translates that index to byte index (9 in the example), effectively splitting at the wrong byte position (9 instead of 6).

Test case

Example test case:

use nom::IResult;
use nom::bytes::complete::take_while_m_n;

fn is_arab_letter(c: char) -> bool {
    match c {
        '\u{061c}'..='\u{064a}' => true,
        _ => false
    }
}

fn arab_string(s: &str) -> IResult<&str, &str> {
    take_while_m_n(0, 64, is_arab_letter)(s)
}

fn main() {
    assert_eq!(arab_string("باب latin"), Ok((&" latin"[..], &"باب"[..])));
}

Output:

thread 'main' panicked at 'assertion failed: `(left == right)`
  left: `Ok(("tin", "باب la"))`,
 right: `Ok((" latin", "باب"))`', src/main.rs:16:5
ackxolotl commented 1 year ago

Oops, duplicate: https://github.com/rust-bakery/nom/issues/1630