Raise consistent Exception when trying to read more than the available data

GoogleCodeExporter commented 9 years ago

What steps will reproduce the problem?
>>> k = BitString(bytes="\x00\x00\x01")
>>> k.read("uint:32")
...
struct.error: unpack requires a string argument of length 4
>>> k.read("bits:32")
...
ValueError: Not enough data present. Need 32 bits, have 24.
>>> k.read("bytes:4")
'\x00\x00\x01'

What is the expected output? What do you see instead?
I want a ValueError (or maybe IndexError) every time I try read more than there 
is to read. The 
last example above should raise an Exception!

What version of the product are you using? And which Python version?
Python 2.6.1 (r261:67515)
BitString 1.1.1

Please provide any additional information below.
I might also have seen an IndexError. Consistency for these errors makes them 
obviously easier 
to catch.

Original issue reported on code.google.com by stefanwe...@gmail.com on 18 Dec 2009 at 4:41

GoogleCodeExporter commented 9 years ago

Thanks a lot for that. That's actually a symptom of quite a nasty bug so I'm 
very
glad that one of the first people to download it caught it and told me! I've 
removed
the 1.1.1 download altogether and will put it back when the problem is fixed. 
(Next
few hours).

No idea how that got past the unit tests!

Original comment by python.bitstring@googlemail.com on 18 Dec 2009 at 5:27

Added labels: Priority-Critical
Removed labels: Priority-Medium

GoogleCodeExporter commented 9 years ago

The unexpected exception has been fixed in revisions 583 and 585 and is 
available in
the 1.1.2 release.

Thanks again for the bug report, but the fix doesn't quite address your 
problem. The
intended behaviour is for a read off the end of the BitString to return the 
rest of
the BitString and not to raise an exception.

There are pros and cons to this approach, but one of the main things in its 
favour is
that is mirrors the ordinary file read function (which returns *at most* size 
bytes
and doesn't raise an exception if it goes off the end of the stream.)

If you'd like I'll enter raising an exception for such reads as an enhancement
request for version 2, but I'm not presently convinced it's a good idea.

Cheers.

Original comment by python.bitstring@googlemail.com on 18 Dec 2009 at 6:33

Changed state: Verified
Added labels: ****
Removed labels: ****

GoogleCodeExporter commented 9 years ago

Hello,

I'm using BitString to unmarshal binary data from a TCP stream. For 
unmarshaling a partial read that does not contain the whole message, I really 
need 
to be notified whether I read the full 32 Bit of an uint:32 or not (if I don't, 
I roll back all reads for that message, wait for more data and try again). Now 
I 
will have to check before every read if I have enough data. That is 
inconvenient.

I can understand your reasoning to mirror the behavior of file-like objects, 
but since BitString doesn't support truly file-like objects (relying on 
os.path.getsize in FileArray), this reasoning is not totally stable. IMHO 
BitString should not be a file-like object, but, like the name states, a string 
of 
bits that has awesomely convenient methods. The read method of a file-like 
object reads from a source with possibly unknown size, BitString can rely 
on the size of its data, so it is actually more like a list-like object.

IMHO the current behavior masks errors that cannot be detected without prior 
position+size>length checking. Consider the first example from my 
original post. Now the result is 1 even though the method call looks like it 
guarantees to give me an integer from 32 bits of binary data.

You should consider the actual use cases of BitString (it's awesome btw). Is 
there a case where a program says read("uint:32") and doesn't actually care 
how much it can read in and can accept a possibly somewhat arbitrary result?
As I see it, BitString is a great library for reading binary data based on 
fixed structures. If the data doesn't fit the structure, because it is 
incomplete/corrupt, the program should really be able notice. 
Put another way, is there another, more significant use case that would suffer 
from raising exceptions when trying to read too much?

In any case, you should document this behavior.

All the best.

Original comment by stefanwe...@gmail.com on 18 Dec 2009 at 7:44

Added labels: ****
Removed labels: ****

GoogleCodeExporter commented 9 years ago

Hi, I think you make some good points. I've added Issue 67 to cover the 
potential
change. As it would change the API it can't happen before version 2 so that's
probably at least 3 months away and perhaps a fair bit more.

Note that the slicing behaviour is similar in some ways. So if you asked for
`k[0:32].uint` on your 24 bit BitString you would get the same thing as
`k.read('uint:32')`. I'm not planning on changing this slicing behaviour as that
really is standard practice for container-like objects.

Incidentally until version 0.4.0 reading off the end did raise an exception. I
changed it to be closer to the file interface!

You're right though that the current behaviour does need better documentation, 
so I
can at least do that!

Original comment by python.bitstring@googlemail.com on 18 Dec 2009 at 11:25

Added labels: ****
Removed labels: ****

kutsurak / python-bitstring

Raise consistent Exception when trying to read more than the available data #66