mdsteele / rust-cfb

Rust library for reading/writing Compound File Binary (structured storage) files
MIT License
44 stars 20 forks source link

`open_stream` as `mut` and `read_stream` as non-`mut` #38

Open smndtrl opened 1 year ago

smndtrl commented 1 year ago

Hi,

while trying to iterate over read_root_storage, I encountered that open_stream borrows &mut self and cannot be easily used with the immutable borrow of read_root_storage.

Have you considered a read_stream that borrows immutable self? Is there a scenario where that wouldn't work?

mdsteele commented 1 year ago

This is a bit subtle. I agree that it's a little annoying that one currently can't read a stream while iterating over storage entries.

The main issue here is that (assuming the underlying file type F supports writing) holding a Stream<F> object allows you to modify that stream's contents, and hence the contents of the CFB file, so it's a little odd to be able to open one from a non-mut CompoundFile reference. Modifying the stream contents mid-iteration also potentially invalidates the metadata in the Entry objects being iterated over (e.g. by changing the length of the stream).

On the other hand, Stream<F> doesn't actually borrow the CompoundFile mutably—it used to, but this was changed specifically so that it would be possible to have multiple streams open at once—and this is what makes the implementation in this PR possible. (In fact, now that I think about it, I don't think there's anything stopping you right now from opening a stream, then starting iteration, then modifying the still-open stream in the middle of iteration, so I guess the cat's already out of the bag on Entry metadata invalidation.)

One solution might be to provide a new CompoundFile::read_stream(&self, path) method that returns a different type (e.g. "ReadOnlyStream") that doesn't allow modifying the stream contents (i.e. by not implementing the Write trait), though that complicates the API. Another solution might be to simply let the existing open_stream be a &self method, and reserve &mut self for methods that alter the CFB storage tree structure (such as create_new_stream()), rather than just stream contents. At the moment I think I'm leaning towards that second one, but I'm not entirely sure what the right thing is here.

tgross35 commented 1 year ago

Just speaking for my use, I would be happy having separate Stream and WriteStream/StreamMut objects or something like that (I think it makes sense for the default "Stream" name to be read-only, though I understand this isn't the most compatible).

Maybe an easy way to track their usage would be by keeping a RefCell<()> within the CompoundFile and storing the Ref<()> in Stream and RefMut<()> in WriteStream? Could be an simple way to track these accesses if you can't pass &F around.

One solution might be to provide a new CompoundFile::read_stream(&self, path) method that returns a different type (e.g. "ReadOnlyStream") that doesn't allow modifying the stream contents (i.e. by not implementing the Write trait), though that complicates the API. Another solution might be to simply let the existing open_stream be a &self method, and reserve &mut self for methods that alter the CFB storage tree structure (such as create_new_stream()), rather than just stream contents. At the moment I think I'm leaning towards that second one, but I'm not entirely sure what the right thing is here.

imho: I would be +1 on having separate types for reading and writing in some way - whether that's via a Stream and StreamMut or things that return &Stream vs. &mut Stream. I think it's a pattern that is pretty common in Rust since it handles the multi-reader xor single-writer pattern well. And it's somewhat more understandable without IDE support - I immediately know that I can read but not write a Stream and r/w a StreamMut, it's less ambiguous than relying on what traits F implements.

jtran commented 1 year ago

In fact, now that I think about it, I don't think there's anything stopping you right now from opening a stream, then starting iteration, then modifying the still-open stream in the middle of iteration, so I guess the cat's already out of the bag on Entry metadata invalidation.

As a user of the library, I think I'd prefer to have a breaking change to the API that makes it safe and explicit about whether I can write, rather than having to just know not to call certain functions. I'm agreeing with @tgross35.