mdsteele / rust-cfb

Rust library for reading/writing Compound File Binary (structured storage) files
MIT License
46 stars 20 forks source link

Mmap #5

Closed ikrivosheev closed 3 years ago

ikrivosheev commented 3 years ago

Hello, thanks for creating this library!

I have one question: can i create mmap object for File in CFB? I am writing application, which unpack files from CFB and send scaninng to the yara engine. Or API for get the position of the start of the file and length?

mdsteele commented 3 years ago

Glad it's useful!

The way that the CFB format is defined, a Stream within a CFB file is not necessarily stored contiguously; it might be broken up into chunks and stored out of order within the CFB file (or interleaved with another stream). So I don't think an API to get the byte offset of the start of the stream data would help you.

I'm not familiar with yara, but after a quick look, it looks like it can only scan contiguous memory or an fd, rather than an arbitrary Rust Read object? So unfortunately, you will probably need to first std::io::copy the CFB Stream into a buffer or tmpfile.

ikrivosheev commented 3 years ago

Thank you for answer!

I have other question: can i get an iterator: (EntryName, Stream)? When i write:

let mut comp = cfb::open("path/to/cfb/file").unwrap();
for entry in comp.read_storage(...).unwrap() {
    let stream = comp.open_stream(entry.path()).unwrap();
    ....
}

I get the error because: open_stream need &mut self but read_storage need &self... is it possible to solve this problem?

I can collect all Entries into Vec, but archives can contain a large number of files.

let mut comp = cfb::open("path/to/cfb/file").unwrap();
let entries = comp.read_storage(...).unwrap().collect::<Entries>();
for entry in entries {
    let stream = comp.open_stream(entry.path()).unwrap();
    ....
}
mdsteele commented 3 years ago

I made an attempt today to add a method to get an iterator over (EntryName, Stream), but unfortunately the object lifetimes don't seem to work out. (Maybe there's some clever way to make it work, but I couldn't find it. Feel free to send a PR if you can make it work!)

For now, collecting all the Entry paths into a Vec is probably the best workaround.