python / cpython

The Python programming language
https://www.python.org
Other
63.49k stars 30.41k forks source link

select.select() corner cases: duplicate fds, out-of-range fds #51923

Open b2b6035c-8867-46ed-8d31-88e3f978dde5 opened 14 years ago

b2b6035c-8867-46ed-8d31-88e3f978dde5 commented 14 years ago
BPO 7674
Nosy @birkenfeld, @berkerpeksag

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields: ```python assignee = None closed_at = None created_at = labels = ['extension-modules', 'type-bug'] title = 'select.select() corner cases: duplicate fds, out-of-range fds' updated_at = user = 'https://bugs.python.org/cdleary' ``` bugs.python.org fields: ```python activity = actor = 'berker.peksag' assignee = 'none' closed = False closed_date = None closer = None components = ['Extension Modules'] creation = creator = 'cdleary' dependencies = [] files = [] hgrepos = [] issue_num = 7674 keywords = [] message_count = 2.0 messages = ['97578', '109992'] nosy_count = 6.0 nosy_names = ['georg.brandl', 'exarkun', 'cdleary', 'docs@python', 'tshepang', 'berker.peksag'] pr_nums = [] priority = 'normal' resolution = None stage = None status = 'open' superseder = None type = 'behavior' url = 'https://bugs.python.org/issue7674' versions = ['Python 3.2'] ```

b2b6035c-8867-46ed-8d31-88e3f978dde5 commented 14 years ago

I was just reading through this ACM article that enumerates some of the issues with the select function in .NET: http://cacm.acm.org/magazines/2009/5/24646-api-design-matters/fulltext

select.select() currently suffers from the same documentation problem where the behavior with duplicate and/or out-of-range file descriptors in one of the sequences (i.e. rlist) is not described.

Given the current implementation of seq2set in trunk it appears that:

  1. A ValueError is raised when a given file descriptor is out of range. (Typically a result of the programmer passing a non-fd value, since FD_SETSIZE is "normally at least equal to the maximum number of descriptors supported by the system.")

  2. Duplicate file descriptor numbers are collapsed into the fd_set, and are therefore idempotent at a system API level.

However, the language-level support code generally assumes no duplication, as there is a fixed size array of (FD_SETSIZE + 1) pylist entries (one additional for a sentinel value). Although there is a TODO to dynamically size that to the largest targeted file descriptor number, that would still assume one PyObject per file descriptor in the input sequences.

The set2list function used to produce a return value will, however, return duplicates: for each value in the input list, if the corresponding fd is set, that pyobject is added to the return list.

Proposed Changes ----------------

At a glance it would seem that the Right Thing to do is to collapse duplicates in the input, as if we created a set(AsFileDescriptor(o) for o in input_list), so that no duplicates will be returned in the result; however, you *can* have a heterogeneous input list with a fileno like 5 and a file-like object whose fileno() resolved to 5, in which case you don't want to arbitrarily choose only one of those PyObjects to return. Therefore, I'm thinking it's probably best to leave it as-is and document it.

In any case, if we want to explicitly allow duplicates in the input list we should probably make the pylist arrays into dynamically sized structures in the sizes of the corresponding input lists for correctness.

If this all makes sense I'll be happy to come up with a module/documentation/unit test patch.

83d2e70e-e599-4a04-b820-3814bbdb9bef commented 14 years ago

Chris, to me it's as clear as mud but please produce a doc patch anyway. :)