Open ismell opened 1 month ago
Hi, thanks for your work on this!
If I understand these changes, this patch looks to make TakeSeek
into a stream slice/view API, which certainly makes sense as a thing that should exist, but probably not as TakeSeek
.
In particular, TakeSeek
exists only because std::io::Take
does not implement Seek
, but binrw needs that trait. As such, it is intended as a drop-in replacement without any other change. Adjusting the signature of limit
indicates it’s being changed to do something it’s not intended to do, and changing the origin of the stream is also something that Take
doesn’t do, so this probably shouldn’t too.
In any case, the solution for this one problem with the patch is pretty straightforward since Take
is just a stream view with origin 0 and a limited size, that TakeSeek
would just become a wrapper for this code, and then this code just needs to use a different name to make it clear that it’s creating a slice starting at the current position of the stream.
My experience with sub-streams where the origin matters has always been in the context of reading something like an offset table, so I would probably recommend adding a second convenience function that takes a SeekFrom
in addition to the size limit so users don’t have to seek the stream before they get the sub-stream.
Let me know your thoughts. Thanks!
In any case, the solution for this one problem with the patch is pretty straightforward since Take is just a stream view with origin 0 and a limited size, that TakeSeek would just become a wrapper for this code, and then this code just needs to use a different name to make it clear that it’s creating a slice starting at the current position of the stream.
Take
on its own has a very clear API. You can read from X..X+len
, then read()
always returns 0. Seek
also has a very clear API where you can position the cursor from an offset. When you combine the two though it is a slice/view API. If the seek
method doesn't properly take into account the bounds, then the API gets very confusing.
Take the following example:
#[test]
fn test_stream_len() {
let mut buf = [0; 8];
let mut data = Cursor::new("\x00\x01\x02\x03\x04\x05\x06\x07\x08");
data.seek(SeekFrom::Start(1)).unwrap();
let mut section = data.take_seek(4);
assert_eq!(section.stream_len().unwrap(), 4); // <--- Performs X = Current(0), End(0), Start(X).
assert_eq!(inner_section.read(&mut buf).unwrap(), 4);
assert_eq!(&buf, b"\x01\x02\x03\x04\x00\x00\x00\x00");
}
Before the fix in 38b35989ffa27e8bb18cd91f69c6bcc66d2ca03e, the test would fail because the returned stream_len()
would be 9. The test still fails (without this PR) because the Start(0)
call isn't handled in seek()
, so the read()
call starts reading from the beginning of data
.
I think if you want to combine Take + Seek, then we need to handle the different SeekFrom
parameters so that the API can't be misused.
As such, it is intended as a drop-in replacement without any other change. Adjusting the signature of limit indicates it’s being changed to do something it’s not intended to do, and changing the origin of the stream is also something that Take doesn’t do, so this probably shouldn’t too.
Unfortunately it stems from https://github.com/rust-lang/rust/issues/59359. stream_position
should have probably been defined as fn stream_position(&self) -> Result<u64>
and used the ftello(3)
syscall instead of lseek(Current(0))
.
My experience with sub-streams where the origin matters has always been in the context of reading something like an offset table
In my case, I'm reading a GPT partition table, and I want to parse the contents in each partition. My partition parsers need to seek within the partition using Start
and End
offsets.
, so I would probably recommend adding a second convenience function that takes a SeekFrom in addition to the size limit so users don’t have to seek the stream before they get the sub-stream.
I think that's a good idea.
Maybe fn take_seek_from(mut self, pos: SeekFrom, limit: u64) -> TakeSeek<Self>
?
Thanks for the review!
Hi,
Take
on its own has a very clear API. You can read fromX..X+len
, thenread()
always returns 0.
I’m trying not to be pedantically irritating here, so apologies in advance if it comes across that way, it is not my intent. A user can call set_limit
to reset the end at any time. If TakeSeek
were implemented in the way proposed in this patch, that would mean that set_limit
would also need to change the origin to whatever is the current position of the stream, because the Rust documentation says calling set_limit
“is the same as constructing a new Take
instance”. This seems undesirable for hopefully obvious reasons.
After writing this, I realise that the fix that I committed is wrong, since the end of TakeSeek
might be beyond the actual end of the parent stream, and so SeekFrom::End
needs to use the minimum of the Take
end and the actual stream end. :face_exhaling:
Seek
also has a very clear API where you can position the cursor from an offset. When you combine the two though it is a slice/view API. If theseek
method doesn't properly take into account the bounds, then the API gets very confusing.
In the same way that offsets recorded within a file may be thought of as absolute or relative, I’m not sure there is any definitively correct non-confusing choice. It would also be confusing to see some error about failing to parse at byte 64, when binrw is actually trying to parse byte 60064 of the original stream, just because the user was trying to limit how many bytes to read.
data.seek(SeekFrom::Start(1)).unwrap(); let mut section = data.take_seek(4); assert_eq!(section.stream_len().unwrap(), 4); // <--- Performs X = Current(0), End(0), Start(X).
At the risk of going on ad nauseam, the design intent of TakeSeek
is just to make it possible for a truncating read to occur so someone can e.g. use binrw::helpers::until_eof
to read a limited list of objects from a stream. In this example, stream_len
should return 5, because the only thing TakeSeek
does is truncate the stream.
Given the problems with conflicts in the std::io::Take
limit
function definition, set_limit
semantics, and potential confusion caused by adjusting the stream origin, I would prefer not to have TakeSeek
do this, and instead give sub-stream functionality a different name such that the semantics can be fully defined by binrw and add whatever additional traits or helper methods would be useful for such functionality. The documentation for TakeSeek
can certainly also be improved to clarify that it is a truncating reader, not a thing that creates sub-streams with their own local origins, to reduce confusion in this regard.
Hopefully this makes sense, but please let me know if there is some other thing I am not thinking about.
Maybe
fn take_seek_from(mut self, pos: SeekFrom, limit: u64) -> TakeSeek<Self>
?
Sure, something with a _from
suffix seems reasonable to me for this.
Thanks,
This change adds support for the missing SeekFrom operations. It refactors the internals a bit. We now keep a range of start..end values that are accessible to the TakeSeek. I dropped manually computed stream position and instead call the inner's
stream_position()
. This does mean thatlimit
needs to take in a&mut
. I was also going to change the return type toResult<u64>
, but that would force all callers to handle theResult
, which I felt like it was a bigger API change.I fixed the test added in 38b35989ffa27e8bb18cd91f69c6bcc66d2ca03e because
StartFrom::Start(0)
used to reset the cursor to the start of the underlying buffer.Fixes jam1garner/binrw#291.