sagemath / sage

Main repository of SageMath
https://www.sagemath.org
Other
1.33k stars 453 forks source link

Upgrade and optimize pexpect #10295

Closed simon-king-jena closed 8 years ago

simon-king-jena commented 13 years ago

We use pexpect version 2.0. Shouldn't we upgrade to the current version 4.0.1?

See also #10294.

New upstream tarballs:

Patches included and submitted upstream:

Note that this branch includes #19616.


Throughput timings (best out of 5):

pexpect 2.0 upstream:

sage: %time _ = str(gp("2^2^22"))
CPU times: user 184 ms, sys: 4 ms, total: 188 ms
Wall time: 384 ms

pexpect 4.0.1 upstream:

sage: %time _ = str(gp("2^2^22"))
CPU times: user 208 ms, sys: 4 ms, total: 212 ms
Wall time: 405 ms

pexpect 4.0.1 + Sage patches:

sage: %time _ = str(gp("2^2^22"))
CPU times: user 9 ms, sys: 7 ms, total: 16 ms
Wall time: 209 ms

Latency timings (best out of 5):

pexpect 2.0 upstream:

sage: gp(1);
sage: %time _ = [gp(i) for i in range(10^4)]
CPU times: user 2.96 s, sys: 855 ms, total: 3.82 s
Wall time: 4.52 s

pexpect 4.0.1 upstream:

sage: gp(1);
sage: %time _ = [gp(i) for i in range(10^4)]
CPU times: user 2.61 s, sys: 758 ms, total: 3.37 s
Wall time: 3.48 s

pexpect 4.0.1 + Sage patches:

sage: gp(1);
sage: %time _ = [gp(i) for i in range(10^4)]
CPU times: user 2.3 s, sys: 999 ms, total: 3.3 s
Wall time: 3.4 s

Depends on #19671

Upstream: Reported upstream. Developers acknowledge bug.

CC: @sagetrac-drkirkby @nexttime @sagetrac-Vincent-Neri

Component: interfaces

Keywords: pexpect upgrade

Author: François Bissey, Bill Page, Jeroen Demeyer

Branch/Commit: 9351ccb

Reviewer: Jeroen Demeyer, Bill Page

Issue created by migration from https://trac.sagemath.org/ticket/10295

kiwifb commented 13 years ago
comment:1

Try version 2.4 even. This has been discussed to death several times on sage-devel. Version 2.1 and over are slower than what we currently have. Plus from experience in sage-on-gentoo where at some stage we experimented with pexpect 2.4, some plotting broke down in the notebook.

kiwifb commented 13 years ago
comment:2

When I say version 2.4 you have to go there: http://pypi.python.org/pypi/pexpect rather than here: http://pypi.python.org/pypi/pexpect

simon-king-jena commented 13 years ago
comment:3

Replying to @kiwifb:

When I say version 2.4 you have to go there: http://pypi.python.org/pypi/pexpect rather than here: http://pypi.python.org/pypi/pexpect

Aren't the two links the same?

So, you say that there might be a performance problem with more recent versions of pexpect. That's bad, because the idea was to get rid of a performance problem by upgrading...

kiwifb commented 13 years ago
comment:4

cut and paste didn't work as expected the second one was meant to be: http://pexpect.sourceforge.net/

search the mailing list for pexpect, it is a proverbial can of worms.

bac7d3ea-3f1b-4826-8464-f0b53d5e12d2 commented 13 years ago
comment:5

2.4 is not released yet. Here's a package for 2.3 that I created.

http://boxen.math.washington.edu/home/kirkby/patches/pexpect-2.3.spkg

but on my OpenSolaris machine, this 2.3 package results in one doctest failure, which is:

sage -t  -long -force_lib devel/sage/sage/interfaces/expect.py

The interface must be semi-working, as the interface to R works - or at lease the R doctest does not fail. Here's the error message of the failed doctest.

sage -t  -long -force_lib devel/sage/sage/interfaces/sage0.py
         [14.8 s]
sage -t  -long -force_lib devel/sage/sage/interfaces/expect.py
Exception pexpect.ExceptionPexpect: ExceptionPexpect() in <bound method spawn.__del__ of <pexpect.spawn object at 0xc5988ac>> ignored

<snip out load more similar errors>

<pexpect.spawn object at 0xc386d6c>> ignored
Exception pexpect.ExceptionPexpect: ExceptionPexpect() in <bound method spawn.__del__ of <pexpect.spawn object at 0xc386d6c>> ignored
**********************************************************************
File "/export/home/drkirkby/sage-4.7.alpha3/devel/sage-main/sage/interfaces/expect.py", line 867:
    sage: r._expect.before
Expected:
    'abc;\r\n[1] '
Got:
    'abc <- 10 +15;\r\n__SAGE__R__PROMPT__> abc;\r\n[1] '
**********************************************************************
1 items had failures:
   1 of  11 in __main__.example_16
***Test Failed*** 1 failures.
For whitespace errors, see the file /export/home/drkirkby/.sage//tmp/.doctest_expect.py
         [18.4 s]
sage -t  -long -force_lib devel/sage/sage/interfaces/gap.py
         [20.4 s]

I think in some cases where we call external programs there are probably better ways of doing this. For example

Dave

bac7d3ea-3f1b-4826-8464-f0b53d5e12d2 commented 13 years ago
comment:6

Oops, I see this doctest is related to R. But I'm sure there are other doctests which make use of pexpect, which are passing

Dave

kiwifb commented 13 years ago
comment:7

While 2.4 is not on sourceforge but on pypi.python.org http://pypi.python.org/pypi/pexpect I think we can call it released unless you have other infos (from author/mailing list). As far as I remember, last time we tried pexpect-2.4 on sage-on-gentoo it broke plotting in the notebook. Can you try to do some plotting in the notebook with this version of pexpect? Digged my email archive, more precisely:

g=sin(x); plot(g,(x,-pi,3*pi/2))

in the notebook. It produced the following for us (at the time, notice the python time stamp):

import os;os.chdir("/tmp/tmpo2KomY");
execfile("_sage_input_1.py")Python 2.6.4 (r264:75706, Mar  4 2010,
21:15:13)
[GCC 4.3.4] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> 
>>> import os;os.chdir("/tmp/tmpo2KomY");
>>> execfile("_sage_input_1.py")
START1

import os;os.chdir("/tmp/tmpY61yjM");
execfile("_sage_input_2.py")

import os;os.chdir("/tmp/tmpXjmsQK");
execfile("_sage_input_3.py")
__SAGE__
__SAGE__import os;os.chdir("/tmp/tmpY61yjM");
__SAGE__execfile("_sage_input_2.py")
START2
__SAGE__
__SAGE__import os;os.chdir("/tmp/tmpXjmsQK");
kiwifb commented 13 years ago
comment:8

Dave, have you tried plotting in the notebook?

kiwifb commented 10 years ago
comment:12

Bump this. There is a pexpect 3.1 now at http://pexpect.readthedocs.org/en/latest/ (note: the sourceforge address will redirect you there). For a long time I thought pexpect was done and there wouldn't be anymore release.

I think if the project is alive they may take request from us.

rwst commented 10 years ago

Description changed:

--- 
+++ 
@@ -1,3 +1,5 @@
-We use pexpect version 2.0. Shouldn't we upgrade to the current version 2.3?
+We use pexpect version 2.0. Shouldn't we upgrade to the current version 3.3?

 This may be related with the performance problem reported at #10294.
+
+See also #502.
kiwifb commented 9 years ago

Description changed:

--- 
+++ 
@@ -3,3 +3,6 @@
 This may be related with the performance problem reported at #10294.

 See also #502.
+
+New upstream tarball:
+[https://pypi.python.org/packages/source/p/pexpect/pexpect-3.3.tar.gz](https://pypi.python.org/packages/source/p/pexpect/pexpect-3.3.tar.gz)
kiwifb commented 9 years ago

Commit: 9b1e098

kiwifb commented 9 years ago

Branch: u/fbissey/pexpect3.3

kiwifb commented 9 years ago
comment:17

Hum wrong stuff in that branch sorry will update shortly.

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

679e33aUpdate pexpect to 3.3
7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Changed commit from 9b1e098 to 679e33a

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

a24eab3Fix broken doctest. _expect.before returns more stuff in pexpect 3.3
7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 9 years ago

Changed commit from 679e33a to a24eab3

kiwifb commented 9 years ago
comment:20

The bad behavior in the notebook originates in sagenb/notebook/worksheet.py, this is potentially broken but the question is why don't we go there we the old pexpect?

83660e46-0051-498b-a8c1-f7a7bd232b5a commented 9 years ago
comment:21

Be aware that #17686 cleans up expect.py.

kiwifb commented 9 years ago
comment:22

Seems to be orthogonal to the only change I made to that file. It'll just need rebasing. On the other hand there seem to be several load functions defined in sage. And it is not obvious why we should get that one in the notebook.

6d95573c-b4ed-47d1-8c51-cb633c034739 commented 9 years ago
comment:23

https://github.com/billpage/sagenb/commit/811a2e762bb041dd34d32d96b003ab46eaab22e9

Here is a patch to sagenb/interfaces/expect.py that corrects the bad behavior in the notebook following upgrade to pexpect 3.3. The problem is due to a difference between pexpect 2.0 and newer versions of pexpect in what happens following a TIMEOUT of an expect; call. TIMEOUT is the intended result from a call to '_read'. In newer versions '_expect.before' is not modified by TIMEOUT so the output continues to accumulate. There is no need to do this ourselves in 'output_status'.

6d95573c-b4ed-47d1-8c51-cb633c034739 commented 9 years ago
comment:24

See also http://pexpect.readthedocs.org/en/latest/history.html?highlight=timeout#version-2-3 for perhaps a better explanation.

kiwifb commented 9 years ago
comment:25

Cool, I'll try that tomorrow morning. The next step will be performance measurements. One of the reason we hadn't moved to at least pexpect 2.4 was poor performance (sloooow). I would be willing to trade performance for something properly maintained but we should know what we are getting into on that front. Then again that TIMEOUT thing may have been the cause.

83660e46-0051-498b-a8c1-f7a7bd232b5a commented 9 years ago
comment:26

Replying to @kiwifb:

Then again that TIMEOUT thing may have been the cause.

The mentioned change was made in 2.3, but already 2.1 was said to be slooooooooow.

But we'll (or you'll) see...

6d95573c-b4ed-47d1-8c51-cb633c034739 commented 9 years ago
comment:27

I also doubt that this particular case of the use of TIMEOUT has anything to do with version 2.1 being slower. The problem here is how the notebook interacts with Sage in an asynchronous manner. TIMEOUT is the expected behavior when checking on the status of a computation. This allows notebook to display partial results, i.e. the output in 'self._so_far'. The notebook code treats TIMEOUT as a Python exception rather than waiting for it with a 'expect' pattern. In the old version of pexpect this exception did not update the '.before' attribute. On the other hand if the code was changed to explicitly look for 'pexpect.TIMEOUT' then we would have had this problem even with the old version.

What is the recommended way to measure Sage performance?

83660e46-0051-498b-a8c1-f7a7bd232b5a commented 9 years ago
comment:28

Replying to @sagetrac-bpage:

What is the recommended way to measure Sage performance?

Good question, I can only talk about the command line interface:

Of course there's time ./sage -t --long src/sage/interfaces/ (probably with -p; one can also set SAGE_TEST_ITER and SAGE_TEST_GLOBAL_ITER). Only testing the interfaces that actually use pexpect (and where the external program is really installed) makes more sense.

Tests that make heavy use of (stand-alone) Maxima/ecl, Singular, gap and gp are meaningful as well. I cannot really list them right now, but src/sage/schemes/elliptic_curves/ and especially src/sage/sandpiles/sandpile.py are known to fall into that category IIRC.

Before testing, it IMHO makes sense to remove or rename $DOT_SAGE/timings*.json, otherwise doctests will get reordered with undesirable impact on the resource usage, influencing the measured timings.

6d95573c-b4ed-47d1-8c51-cb633c034739 commented 9 years ago
comment:29

On my system with pexpect 3.x from github I get:

  wspage@suse:~/sage> inxi -SCI
  System:    Host: suse Kernel: 3.0.101-0.46-default x86_64 (64 bit) 
             Desktop Gnome 2.28.2 Distro: SUSE Linux Enterprise Server 11 (x86_64) VERSION = 11 PATCHLEVEL = 3
  CPU:       Quad core Intel Core i5-2500K CPU (-MCP-) cache: 6144 KB flags: (lm nx sse sse2 sse3 sse4_1 sse4_2 ssse3 vmx) 
             Clock Speeds: 1: 1600.00 MHz 2: 1600.00 MHz 3: 3301.00 MHz 4: 1600.00 MHz
  Info:      Processes: 245 Uptime: 67 days  5:09 Memory: 11461.1/15917.0MB Client: Shell inxi: 1.7.24 

  wspage@suse:~/sage> time ./sage -t --long src/sage/interfaces/
  ...
  ----------------------------------------------------------------------
  sage -t --long src/sage/interfaces/expect.py  # 1 doctest failed
  ----------------------------------------------------------------------
  Total time for all tests: 123.2 seconds
      cpu time: 14.3 seconds
      cumulative wall time: 120.5 seconds

  real  2m18.358s
  user  0m56.396s
  sys   0m4.856s
williamstein commented 9 years ago
comment:30

You should do some things like

    %timeit gp.eval('2+3')
    %timeit gp('2') + gp('3')
    %timeit axiom('2+3')

etc. The doctesting won't testing latency much, and latency is the main issue with performance with the pexpect interfaces. The other is size, e.g.,

    %timeit s=str(gp.eval('2^10000'))
kiwifb commented 9 years ago
comment:31

I will move to do the latency test shortly but for reference when I do the doctesting of sage/interface on my machine. pexpect-2.0

Total time for all tests: 118.3 seconds
    cpu time: 20.6 seconds
    cumulative wall time: 115.9 seconds

real    1m59.638s
user    1m21.060s
sys     0m3.290s

pexpect-3.3

Total time for all tests: 128.3 seconds
    cpu time: 21.3 seconds
    cumulative wall time: 125.9 seconds

real    2m9.655s
user    1m27.270s
sys     0m3.200s

So it's a little bit slower already. I don't know if anything skew that a bit. Moving to the latency test which may be a better test anyway.

kiwifb commented 9 years ago
comment:32

pexpect-2.0

┌────────────────────────────────────────────────────────────────────┐
│ SageMath Version 6.7.beta4, Release Date: 2015-05-05               │
│ Type "notebook()" for the browser-based notebook interface.        │
│ Type "help()" for help.                                            │
└────────────────────────────────────────────────────────────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Warning: this is a prerelease version, and it may be unstable.     ┃
┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛
sage: %timeit gp.eval('2+3')
The slowest run took 269.56 times longer than the fastest. This could mean that an intermediate result is being cached 
10000 loops, best of 3: 94.4 us per loop
sage: %timeit gp('2') + gp('3')
The slowest run took 8.78 times longer than the fastest. This could mean that an intermediate result is being cached 
1000 loops, best of 3: 556 us per loop
sage: %timeit s=str(gp.eval('2^10000'))
1000 loops, best of 3: 250 us per loop

One time only so we don't use the cache

sage: %time gp.eval('2+3')
CPU times: user 0 ns, sys: 0 ns, total: 0 ns
Wall time: 728 us
'5'
sage: %time gp('2') + gp('3')
CPU times: user 0 ns, sys: 0 ns, total: 0 ns
Wall time: 1.51 ms
5
sage: %time s=str(gp.eval('2^10000'))
CPU times: user 0 ns, sys: 0 ns, total: 0 ns
Wall time: 667 us

pexpect-3.3

sage: %timeit gp.eval('2+3')
The slowest run took 80.90 times longer than the fastest. This could mean that an intermediate result is being cached 
1000 loops, best of 3: 309 us per loop
sage: %timeit gp('2') + gp('3')
1000 loops, best of 3: 1.54 ms per loop
sage: %timeit s=str(gp.eval('2^10000'))
The slowest run took 6.02 times longer than the fastest. This could mean that an intermediate result is being cached 
1000 loops, best of 3: 513 us per loop

One time only

sage: %time gp.eval('2+3')
CPU times: user 10 ms, sys: 0 ns, total: 10 ms
Wall time: 30.5 ms
'5'
sage: %time gp('2') + gp('3')
CPU times: user 0 ns, sys: 0 ns, total: 0 ns
Wall time: 4.58 ms
5
sage: %time s=str(gp.eval('2^10000'))
CPU times: user 0 ns, sys: 0 ns, total: 0 ns
Wall time: 644 us

So it is a little bit all over the place, but generally 3 to 4 times slower with pexpect 3.3. I don't know if it is acceptable. Since upstream is now alive and well we could take it as an issue with them?

williamstein commented 9 years ago
comment:33

So it is a little bit all over the place, but generally 3 to 4 times slower with pexpect 3.3. I don't know if it is acceptable.

No, this is definitely not acceptable. For real work it could easily mean people's programs are 4 times slower due to this upgrade. It's important to understand that the use case of pexpect by Sage is massively different than the use case of pexpect for absolutely everybody else -- for them some latency issues are no big deal, but for us they are.

Incidentally, I think there is no caching going on despite what %timeit says...

kiwifb commented 9 years ago
comment:34

Right, I am hearing you. This is a definite problem. So is keeping dead software. If we are keeping the pexpect 2.0 branch in for performance issue it becomes the responsibility of sage developers to maintain it and add new features like the one Bill wants.

Effectively we are maintaining a fork and if we are going that way I would like it to be formalized with a different name and possibly integrated in the sage code itself. This is so people don't get upgraded to a different pexpect through pip if they are using it. Also while our core functionality uses this pexpect, we don't want to block other packages that would need a newer pexpect.

That is my opinion on the matter. The other way is to work, possibly with upstream, to solve that latency problem in newer versions.

williamstein commented 9 years ago
comment:35

fbissey -- definitely -- I completely agree with you, as long as "we" = not me :-)

kiwifb commented 9 years ago
comment:36

@bill do you think it would be easy to bake unicode support in pexpect 2.0? Or would it be easier to work out the latency issue? You seem to have a good handle on what's in pexpect.

6d95573c-b4ed-47d1-8c51-cb633c034739 commented 9 years ago
comment:37

Apparently the performance problem originates with this old commit:

https://github.com/pexpect/pexpect/commit/600463686284ea102548f5e0bf51582db051dc78#diff-dfb232b43530f0961894e09a0d862aa4R1048

after commenting out

https://github.com/pexpect/pexpect/blob/3.x/pexpect/__init__.py#L1540

                #time.sleep(0.0001)

performance changes from

sage: sage: %timeit gp.eval('2+3')
1000 loops, best of 3: 427 µs per loop
sage: sage: %timeit gp.eval('2+3')
1000 loops, best of 3: 449 µs per loop

to this

sage: %timeit gp.eval('2+3')
The slowest run took 175.25 times longer than the fastest. This could mean that an intermediate result is being cached 
10000 loops, best of 3: 76.3 µs per loop
sage: %timeit gp.eval('2+3')
The slowest run took 7.85 times longer than the fastest. This could mean that an intermediate result is being cached 
10000 loops, best of 3: 76.4 µs per loop
kiwifb commented 9 years ago
comment:38

Excellent job! Do we have contact upstream to see if this is really needed or could be made optional?

jdemeyer commented 9 years ago
comment:39

Alternative to patching pexpect, we can just replace pieces of pexpect that we don't like with custom Python or Cython code (I do this in #17686 for exiting programs).

simon-king-jena commented 9 years ago
comment:40

Replying to @williamstein:

No, this is definitely not acceptable. For real work it could easily mean people's programs are 4 times slower due to this upgrade.

Indeed. I am even using a patched version of my department's linux distribution, since otherwise the latency (of pseudo-TTY, which slows down pexpect) makes my group cohomology spkg practically unusable.

kiwifb commented 9 years ago
comment:41

Replying to @jdemeyer:

Alternative to patching pexpect, we can just replace pieces of pexpect that we don't like with custom Python or Cython code (I do this in #17686 for exiting programs).

That's interesting if we can do it in a systematic way. I wouldn't mind getting rid of most pexpect interface with the exception of the notebook (which may be harder).

jdemeyer commented 9 years ago
comment:42

Replying to @kiwifb:

That's interesting if we can do it in a systematic way. I wouldn't mind getting rid of most pexpect interface with the exception of the notebook (which may be harder).

I don't mean to stop using pexpect interfaces. I mean replacing parts of pexpect itself: the main pexpect code is in one Python class spawn and we can monkey-patch that by subclassing that and changing methods as we please.

Since pexpect uses some low-level OS stuff, implementing the critical parts in Cython might also give significant speed gain (I haven't tested this yet, but surely calling os.foo() in Python and playing with strings will be slower than directly calling the foo() system call from Cython).

83660e46-0051-498b-a8c1-f7a7bd232b5a commented 9 years ago
comment:43

Replying to @sagetrac-bpage:

Apparently the performance problem originates with this old commit:

https://github.com/pexpect/pexpect/commit/600463686284ea102548f5e0bf51582db051dc78#diff-dfb232b43530f0961894e09a0d862aa4R1048

after commenting out

https://github.com/pexpect/pexpect/blob/3.x/pexpect/__init__.py#L1540

                #time.sleep(0.0001)

Hahaha, I was going to grab 2.1 and investigate the changes compared to 2.0. (Didn't realize the full history was kept.)

Of course continually polling without delay isn't nice either; it will significantly increase the CPU load.

kiwifb commented 9 years ago
comment:44

Replying to @nexttime:

Replying to @sagetrac-bpage:

Apparently the performance problem originates with this old commit:

https://github.com/pexpect/pexpect/commit/600463686284ea102548f5e0bf51582db051dc78#diff-dfb232b43530f0961894e09a0d862aa4R1048

after commenting out

https://github.com/pexpect/pexpect/blob/3.x/pexpect/__init__.py#L1540

                #time.sleep(0.0001)

Hahaha, I was going to grab 2.1 and investigate the changes compared to 2.0. (Didn't realize the full history was kept.)

Of course continually polling without delay isn't nice either; it will significantly increase the CPU load.

Will it? After all unless there are quite a number of other changes, it is the current situation with 2.0, isn't it? Upgrading without patching would decrease the CPU load if I am not mistaken.

Anyway, I like Jeroen's idea but we will need to upgrade pexpect in any case. I am all in favour of upgrading now with a patch and working on some cythonized elements for pexpect in a follow up ticket. We could remove the patch when we have something satisfactory in place.

83660e46-0051-498b-a8c1-f7a7bd232b5a commented 9 years ago
comment:45

Replying to @kiwifb:

Replying to @nexttime:

Of course continually polling without delay isn't nice either; it will significantly increase the CPU load.

Will it?

http://en.wikipedia.org/wiki/Busy_waiting.

I'll have to look closer at the code. Haven't tested yet, but I guess it'll get worse the longer the net computation in the subprocess takes; the examples above only do short toy computations and don't take the overall CPU time into account. (It may lead to effectively doubling SAGE_NUM_THREADS in ptestlong.)

After all unless there are quite a number of other changes, it is the current situation with 2.0, isn't it? Upgrading without patching would decrease the CPU load if I am not mistaken.

Probably. In fact, for me ptestlong with 3.3 (unpatched) went faster than with 2.0, although only slightly, under the same conditions (constant low load). (But in both cases I had some process apparently still waiting for Singular after all other tests had already finished, so the wall time got messed up a bit.)

But I'm ok with first upgrading here and removing the sleep, then checking what else we could tweak on another ticket.

Anyway, I like Jeroen's idea but we will need to upgrade pexpect in any case. I am all in favour of upgrading now with a patch and working on some cythonized elements for pexpect in a follow up ticket. We could remove the patch when we have something satisfactory in place.

Subclassing, probably with individual changes for different interfaces or dynamically changing parameters is the way to go I think.

While we could perhaps do some parts in Cython, leaving Python of course puts more burden of portablility on us, depending on what exactly we do.

6d95573c-b4ed-47d1-8c51-cb633c034739 commented 9 years ago
comment:46

Perhaps proposing an upstream patch to provide an optional parameter such as

expect(prompt,yield=false)

to optionally avoid the call to sleep would be a good idea? Apparently there are situations such as the way that Sage uses gap where busy waiting is a good thing. Meanwhile in Python 3.x there is asyncio and pexpect 3.x already supports that.

6d95573c-b4ed-47d1-8c51-cb633c034739 commented 9 years ago
comment:47

Based on http://stackoverflow.com/questions/7273474/behavior-of-pythons-time-sleep0-under-linux-does-it-cause-a-context-switch

I tried

                time.sleep(0)

in pexpect/init.py and the result on my linux system remains fast (78.9 µs per loop). In principle the use of sleep(0) should allow scheduling on all systems but it might be a good idea for someone with a MAC to try this on OSX.

The pexpect people might accept a patch to change this to 0.

jdemeyer commented 9 years ago
comment:48

Why do you think it's busy waiting? It looks like it's reading with a timeout, which is not busy waiting.

83660e46-0051-498b-a8c1-f7a7bd232b5a commented 9 years ago
comment:49

Replying to @sagetrac-bpage:

The pexpect people might accept a patch to change this to 0.

Well, rather the delay should be configurable.

But the whole expect_loop() and read_nonblocking() look pretty messy. We should perhaps simply overwrite one or both with a version that fits our needs. The whole class certainly isn't written for performance.

Note that the behaviour also depends on the writer (i.e., the subprocess); ECL for example is extremely poor in that it doesn't write strings, but really character by character, such that in the worst case we here would end up with at least len(s) * 0.1 ms (and at least len(s) context switches, not including the system calls themselves, where pexpect doesn't only call select() and probably read(), but also waitpid() because of the isalive() checks).

6d95573c-b4ed-47d1-8c51-cb633c034739 commented 9 years ago
comment:50

@jdemeyer Yes I see, so I wonder why they need time.sleep at all? I am back to asking them to just remove it. But if they prefer a patch with a configurable sleep time then why not?

@nexttime Re: rewriting the loop. For sure if you have the time but I think the priority should be to get pexpect updated soonest.

83660e46-0051-498b-a8c1-f7a7bd232b5a commented 9 years ago
comment:51

Replying to @jdemeyer:

Why do you think it's busy waiting? It looks like it's reading with a timeout, which is not busy waiting.

Unless the timeout is None, it is busy waiting.

With a larger timeout, it gets less worse, but is still busy waiting.