pexpect / pexpect

A Python module for controlling interactive programs in a pseudo-terminal
http://pexpect.readthedocs.io/
Other
2.63k stars 478 forks source link

select.select() cannot handle more than 1024 open files #47

Closed vbz1111 closed 6 years ago

vbz1111 commented 10 years ago

I have been using pexpect for many months with no problem. However, I recently this line:

1680 return select.select(iwtd, owtd, ewtd, timeout)

Has been creating a few problems. The select system call cannot handle more than 1024 file descriptors. On the other hand, the select.poll() system call does not have such limitations.

I understand that not all platforms support poll() but it would be nice to incorporate this system call as an option.

vbz1111 commented 10 years ago

I have a patch for this in my workspace. If you are interested in looking at it let me know.

takluyver commented 10 years ago

I'm not sure how you could ever hit that - we only ever call it with one or two file descriptors. Or is it unable to handle file descriptors above 1024? But if it is a problem, by all means post the patch.

vbz1111 commented 10 years ago

Lets say I open 1024 files , hence the process as 1024 file descriptors to poll from, and try to spawn a pexpect session. Now the child_fd will be above 1024. At this point I get: ValueError: filedescriptor out of range in select()

I might need to create a test case for you and I will post my patch sometime when I get some free time. Just wanted to see if anyone else has though about this.

Thanks

takluyver commented 10 years ago

OK, so the issue is that it can't handle fds beyond 1024. It makes sense to have a way to work round that, then. If you're familiar with Github, feel free to make a pull request.

lu-zero commented 10 years ago

just use poll() and be happy.

jquast commented 10 years ago

poll is not available on bsd.

lu-zero commented 10 years ago

http://www.unix.com/man-page/freebsd/2/poll/ claims it is supported so does https://developer.apple.com/library/mac/documentation/Darwin/Reference/ManPages/man2/poll.2.html and http://nixdoc.net/man-pages/OpenBSD/poll.2.html

If you mean that certain old versions of macosx had a broken poll implementation and their stock python doesn't provide it nowadays it is true.

The long solution is supporting select.kqueue where it is avalable and use select.poll where is available probably.

On Mon, Jun 2, 2014 at 5:53 AM, Jeff Quast notifications@github.com wrote:

poll is not available on bsd.

— Reply to this email directly or view it on GitHub https://github.com/pexpect/pexpect/issues/47#issuecomment-44800191.

takluyver commented 10 years ago

This page lists where the OS itself does not support poll(). This page mentions that it was broken in OS X "at least up to 10.4" (10.5 released 2007).

As @lu-zero comments, system Python /usr/bin/python on modern Macs does not have poll(), and even on a Python version which does have it, it doesn't seem to work correctly for a pty fd (in my attempts to use it, it just gives me the POLLNVAL (invalid fd) event under all conditions.

Pexpect also aims to support less familiar platforms, like Solaris (there's code in there for IRIX, but hopefully no-one is actually relying on that). I'm planning to set up an OpenIndiana (Solaris variant) VM, but I can't test on obscure/proprietary platforms. So I see two ways forwards:

  1. Pexpect implements some robust, conservative logic for deciding when one of the better select-like interfaces is usable, falling back to select() when they're not.
  2. Pexpect keeps select() as the default, but makes it easier to switch in one of the alternatives.

Another concern is that Pexpect has some crusty EOF-handling code for different platforms that relies on select() handling

gpshead commented 10 years ago

BSD (and OS X) should be able to use kqueue(). https://docs.python.org/2/library/select.html#select.kqueue

(Yes this is all a mess of platform specific crap. Python 3.4 makes it a lot better with https://docs.python.org/3/library/selectors.html)

takluyver commented 10 years ago

For the record, in my new OpenIndiana VM, in Python 2.6, poll indicates POLLIN on EOF, and reading from it gives an empty string once. After that, read() will hang, and poll won't indicate any events for that fd.

This site has a table of variances for poll() on EOF, though it doesn't include ptys, and the behaviour it indicates for pipes on SunOS is not the same as that for ptys on OpenIndiana.

takluyver commented 10 years ago

127 asked for a way to select the select-ish implementation used.

I'm actually leaning the other way, towards automatically picking different defaults based on the platform. We need to be a bit more conservative than the stdlib selectors module, though, because there are platforms where e.g. poll exists but is not usable with pty fds. So I would whitelist 'known good' implementations for specific platforms.

sagarinpursue commented 9 years ago

even i am facing the same issue with file descriptor, i do not want to use anything other than select as there are lot of dependency. if there are any work around for the same please help.

shoyer commented 7 years ago

This bit me today.

I'm happy to give this a shot, but I don't have idea what sort of conservative platform checks would be appropriate. Can anyone suggest some minimal checks?

Presumably we could get this info from the platform module: https://docs.python.org/2/library/platform.html

cooperlees commented 6 years ago

@shoyer + others - Did anyone get a patch somewhere for this? Otherwise I want to look at implementing a refactor to allow Linux to use select.poll() and fallback to select.select() for other platforms to maintain compatibility.

Can I also assume (if I do the PR) that we will be >= Python 2.7? TravisCI only does this. So I assume so.

takluyver commented 6 years ago

We're definitely not trying to support anything older than Python 2.7. But I'm now quite wary of making non-trivial changes to Pexpect. There's so much old code out there using it, and I've already accidentally broken things with changes I thought would be fine.

I think it would be better for someone to make a new library that can do similar tasks to Pexpect but isn't bound to follow its API. In particular, if I was building something like this today, I would try to use the Sans-I/O paradigm, to separate the pattern searching that's the core of what Pexpect does from dealing with I/O.

shoyer commented 6 years ago

I ended up writing a patch which worked for my case, but I didn't test it extensively or add any platform detection logic. That said, in case it's helpful please see https://github.com/pexpect/pexpect/pull/470. I don't plan to finish this, but it would be great if someone else took it up.

cooperlees commented 6 years ago

The main problem I have and want to solve here is when other third party modules want pexpect. My example is when something we've written uses a fork such as pexpect-u, that drops itself in as pexpect to be an easy drop in replacement. ipython is one module that wants pexpect and we offer building ipython debug modules for people and thus hit a name collision here if they also include the fork.

I think very carefully inserting a config driven selection or platform dependent selection maintaining the behavior we have today as default is very doable. This should ensure very little breakage. That said, which way would people prefer and more likely accept?

Other ideas?

takluyver commented 6 years ago

I'd probably favour having to explicitly enable the new behaviour, because I suspect it might be hard to capture all the ways it could fail on different platforms (e.g. maybe it accepts the fd but just never says it's ready).

I'm also an IPython maintainer - for IPython we'd probably be willing to take slightly more risks with the default. I tend to keep a very conservative policy with Pexpect.

cooperlees commented 6 years ago

Have put up https://github.com/pexpect/pexpect/pull/474 to address this issue.

cooperlees commented 6 years ago

What's the plan on releasing 4.5.0 now we're all merged?

takluyver commented 6 years ago

If you could tackle the issue with interact() and poll (see my comment on #474), I think everything else is ready.

takluyver commented 6 years ago

4.5.0 is out now :-)

takluyver commented 6 years ago

To summarise the resolution: as of Pexpect 4.5, you can pass spawn(..., use_poll=True) if you need it to work with file descriptors above 1024.

I know it may be annoying that it's not the default, but I'm (now) very wary of compatibility for Pexpect. It's been used on some less popular platforms which I'm not familiar with, and it works with ptys, which are fairly odd objects in themselves. So the default is to stick to what it's been doing for a long time.