staktrace / cafebabe

A java class file parser
38 stars 12 forks source link

Parsing is locked to be single-threaded #42

Closed C0D3-M4513R closed 4 weeks ago

C0D3-M4513R commented 1 month ago

You can't do multi-threaded parsing, because this crate only uses Rc amongst other things.

    let mut bin_classes = Vec::<Box<[u8]>>::new();

    //... read some classes into `bin_classes` ...

    //parse bin_classes
    let bin_classes = {
        bin_classes.shrink_to_fit();
        let mut js = JoinSet::new();
        for data in &bin_classes {
            js.spawn(tokio::task::spawn_blocking(||cafebabe::parse_class_with_options(&data, ParseOptions::default().parse_bytecode(true))?));
        }
        let mut parsed = Vec::with_capacity(bin_classes.len());
        while let Some(task) = js.join_next().await {
            parsed.push(task??);
        }
        parsed
    };
staktrace commented 1 month ago

I think the bigger problem is the lifetime of your binary data, no? This crate intentionally expects the binary class data's lifetime to outlive the ClassFile structure, because that means we can avoid most/all memory allocation during parsing and just keep refs into the binary class data. But that also means you can't do things like borrow data in &bin_classes and expect the resulting ClassFile (in parsed) to stick around afterwards. Rust won't allow it.

Even if you changed Rc to a threadsafe thing you'd still run into this problem I think.

C0D3-M4513R commented 1 month ago

That's true. I came to the point where I had almost everything working, but because of tokio rust expected the data to live for 'static. And unfortunately I know of no way to make that happen.

I'd be curious how much space is actually being reused though? As far as I can see only strings are ever referenced and bytecode data for some reason. And you came avoid having duplicate strings by just having a Arc.

But that also means you can't do things like borrow data in &bin_classes and expect the resulting ClassFile (in parsed) to stick around afterwards

Actually you can. I did this, which does compile:

    let bin_classes = {
        let mut parsed = Vec::with_capacity(bin_classes.len());
        for data in &bin_classes {
            parsed.push(cafebabe::parse_class_with_options(&data, ParseOptions::default().parse_bytecode(true))?);
        }
        parsed
    };
staktrace commented 1 month ago

Please note that when you edit GitHub comments I don't get email notifications for that. So if you have something to add to what you said before it's better to put a new comment than to edit what you wrote before.

I'll play around a bit with this and see if there's a reasonable change to make to cafebabe here.

staktrace commented 4 weeks ago

I've added a threadsafe feature that swaps out Rc for Arc and RefCell for Mutex and makes cafebabe threadsafe. There's an examples/threadsafe.rs example that demonstrates usage. I don't know if this helps you with tokio and async but at least it does solve some use cases.