stevedonovan / lua-patterns

Exposing Lua string patterns to Rust
MIT License
28 stars 5 forks source link

Possible to cause a panic with gsub #1

Open ghost opened 3 years ago

ghost commented 3 years ago
use lua_patterns::LuaPattern;

fn main() {
    let mut m = LuaPattern::from_bytes(&[160]);
    let s = "⚠";
    s.bytes().for_each(|b| println!("{}" , b));

    m.gsub(s , "");
}

The ⚠ is made up of the bytes [226 , 154 , 160] removing the 160 causes the string to become invalid unicode

stevedonovan commented 3 years ago

Thanks for the report - yes, this kind of thing is always possible with arbitrary byte patterns. Not sure what the policy here should be.

I think we need gsub_bytes - there's already gsub_bytes_with.

stevedonovan commented 3 years ago

Although what the format of the second replacement argument is - not yet sure!

ghost commented 3 years ago

I think that any function that handles strings should just be a wrapper around the one which handles bytes but then calls vec.into(). such as

gsub(&mut self , text: &str , repl: &str) -> String{
    self.gsub_bytes(text.as_bytes() , repl.as_bytes).into()
}

this is a zero copy operation and only needs to scan for invalid bytes

ghost commented 3 years ago

also does the functions passed to gsub_*_with really need to return an owned type could they not return a &[u8] or &str

stevedonovan commented 3 years ago

also does the functions passed to gsub_*_with really need to return an owned type could they not return a &[u8] or &str

We can do what the regex crate does, and return an appropriate Cow to optimize the case where no substitution took place.

stevedonovan commented 3 years ago

I think that any function that handles strings should just be a wrapper around the one which handles bytes but then calls vec.into(). such as

This is pretty much how gsub is implemented now - only really have to do the bytes case!

There is also gsub_checked which returns a Result, to avoid the panicking problem.