python / cpython

The Python programming language
https://www.python.org
Other
63.51k stars 30.42k forks source link

sndhdr erroneously detects files as vox #53489

Closed 955b8acf-25ae-499f-be8e-a029f22a3534 closed 14 years ago

955b8acf-25ae-499f-be8e-a029f22a3534 commented 14 years ago
BPO 9243
Nosy @vstinner, @bitdancer
Files
  • sndhdr-fix.patch: Patch which fixes the issue described.
  • test_sndhdr.tar.gz: Unittest for sndhdr format detection.
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields: ```python assignee = 'https://github.com/bitdancer' closed_at = created_at = labels = ['type-bug', 'library'] title = 'sndhdr erroneously detects files as vox' updated_at = user = 'https://bugs.python.org/jbit' ``` bugs.python.org fields: ```python activity = actor = 'vstinner' assignee = 'r.david.murray' closed = True closed_date = closer = 'vstinner' components = ['Library (Lib)'] creation = creator = 'jbit' dependencies = [] files = ['17981', '17986'] hgrepos = [] issue_num = 9243 keywords = ['patch'] message_count = 5.0 messages = ['110172', '110198', '110199', '110206', '110243'] nosy_count = 3.0 nosy_names = ['vstinner', 'r.david.murray', 'jbit'] pr_nums = [] priority = 'normal' resolution = 'fixed' stage = 'test needed' status = 'closed' superseder = None type = 'behavior' url = 'https://bugs.python.org/issue9243' versions = ['Python 3.1', 'Python 3.2'] ```

    955b8acf-25ae-499f-be8e-a029f22a3534 commented 14 years ago

    http://svn.python.org/view/python/branches/py3k/Lib/sndhdr.py?r1=56957&r2=56987 seems to have broken sndhdr as it incorrectly detects files as voc (all that time ago and nobody has noticed...):

    [jbit@miku]~$ xxd test.wav | head -n 1
    0000000: 5249 4646 2478 e001 5741 5645 666d 7420  RIFF$x..WAVEfmt 
    [jbit@miku]~$ python3.1
    >>> import sndhdr
    >>> sndhdr.what("test.wav")
    ('voc', 0, 1, -1, 8)

    I'm a little unsure why it got changed from indexing to startswith() at all, but changes like the following are just totally incorrect.

    Attached is a patch against the py3k branch which fixes the issue as well as some other things obviously broken by the same changelist, although it might be a good idea to review the changelist again to make sure there's nothing else obviously broken :)
    >>> sndhdr.what("test.wav")
    ('wav', 48000, 2, -1, 16)
    bitdancer commented 14 years ago

    Since email uses this and it doesn't look like anyone else is likely to have much knowledge of this module, I'm going to assign it to myself so I don't forget about it.

    This module has no unit tests. It would be really great if you could create some to at least test for this regression. I'll be glad to provide guidance on the test suite if you need it, but I don't know all that much about audio files so I'm not in a position to write the tests myself at the moment.

    The first changeset is just the translation to bytes, I don't see any issues there. I'm adding Victor as nosy since he wrote the second changeset, but the fix looks straightforwardly correct to me.

    By the way, if you had written r56957 and r56987, roundup would have turned them into links, instead of quoting the link you did supply so that it wasn't usable :)

    bitdancer commented 14 years ago

    Oh, and the reason it got changed to startswith is probably a micro-optimization: startswith is more efficient than doing a slice-and-compare.

    955b8acf-25ae-499f-be8e-a029f22a3534 commented 14 years ago

    Thanks for the update... The link was actually just a "diff to previous" of the changelist that caused the problem (r56987), sorry for the confusion. :)

    Attached is a quick and dirty unittest complete with some test files. It only tests the format of the file though, not any of the additional information returned.

    With an untouched sndhdr.py it finds the issue reported, and with sndhdr-fix.patch applied it reports only one issue:

    Traceback (most recent call last):
      File "test_sndhdr.py", line 22, in test_aiff
        self.assertEqual(ret[0], 'aiff')
    AssertionError: b'aiff' != 'aiff'

    I'm thinking that on line 65 of sndhdr.py the b'aiff' should be just 'aiff' (like everything else). At least I can't see a reason for it to be different. (Changing this fixes the unittest issue)

    Cheers

    vstinner commented 14 years ago

    Oh, and the reason it got changed to startswith is probably a micro-optimization: startswith is more efficient than doing a slice-and-compare.

    Not really. I changed it because it's more readable (it's easier to understand that the code).

    --

    James: I commited all your fixes + your test suite to 3.1 (r82859 + r82860) and 3.2 (r82856 + r82857) and added your name to Misc/ACKS. Thanks for your contribution.