Integration tests for pty.spawn on Linux and all other platforms

0d013f0e-0271-4f01-beb4-e3054bfc3188 commented 7 years ago

BPO	29070
Nosy	@xdegaye, @vadmium, @moreati, @diekmann, @8vasu
PRs	python/cpython#2932
Files	PtyLinuxIntegrtionTest.patch: PtyLinuxIntegrtionTest patch (version1: Linux-specific) pty_tests.patch: Extended, updated, posix-compatible pty test suite pty_tests.patch: Extended, updated, posix-compatible pty test suite (v2) pty.patch: test suite v3 including patch for issue26228 pty.patch: test suite v4 including patch for issue26228 pty.patch: test suite v5 including patch for issue26228

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields: ```python assignee = None closed_at = None created_at = labels = ['3.7', 'type-feature', 'tests'] title = 'Integration tests for pty.spawn on Linux and all other platforms' updated_at = user = 'https://github.com/diekmann' ``` bugs.python.org fields: ```python activity = actor = 'soumendra' assignee = 'none' closed = False closed_date = None closer = None components = ['Tests'] creation = creator = 'Cornelius Diekmann' dependencies = [] files = ['46035', '46115', '46177', '46226', '46613', '46804'] hgrepos = [] issue_num = 29070 keywords = ['patch'] message_count = 12.0 messages = ['284003', '284492', '284669', '284670', '284842', '284942', '285038', '286471', '286683', '287450', '291713', '375085'] nosy_count = 6.0 nosy_names = ['xdegaye', 'martin.panter', 'Alex.Willmer', 'chris.torek', 'Cornelius Diekmann', 'soumendra'] pr_nums = ['2932'] priority = 'normal' resolution = None stage = 'patch review' status = 'open' superseder = None type = 'enhancement' url = 'https://bugs.python.org/issue29070' versions = ['Python 2.7', 'Python 3.5', 'Python 3.6', 'Python 3.7'] ```

0d013f0e-0271-4f01-beb4-e3054bfc3188 commented 7 years ago

As Martin Panter noted, "it doesn’t look like pty.spawn() is tested at all" [bpo-26228]. So I wrote some very basic integration tests for pty.spawn.

They work perfectly on my Linux machine. Line 4 of the library module under test (pty.py) states: "Only tested on Linux." I don't like this comment ;-)

There are two possibilities how to proceed from here: 1) The tests work perfectly on all other platforms: then we can remove this comment from pty.py. 2) The tests fail on other platforms: then we can finally document where this module is expected to work and where not.

In either case, I need some help by non-Linux users to run theses tests and document their success/failure and whether we have a bug on the specific platform (i.e. pty.spawn should work but does not) or whether the tests should be skipped (i.e. platform does not support ptys at all).

It would be really awesome if some *BSD and Windows users would join the nosy list :-)

Happy Holidays!

0d013f0e-0271-4f01-beb4-e3054bfc3188 commented 7 years ago

I did a larger update to my proposed patch. I read the pty man pages and posix standard and designed a test suite which documents the expected behavior of the pty module. The test suite is biased towards my Linux system, but I respect the posix standard. Tested on Linux, OS X, and FreeBSD.

The patch introduces no behavioral changes to the actual pty.py module! It should be easy to merge.

Changes to pty.py: I made _select and _execlp local definitions of the pty module to ease mocking them in the test suite. A comment finally explains why pty.py defines STDIN_FILENO, STDOUT_FILENO, ... itself.

All tests pass on Linux. Tests fail on macOS 10.6.8 and FreeBSD 11. The tests just hang. The patch suggested in bpo-26228 solves these issues. With this patch (or even without patch on Linux), the test suite needs about 1 to 2 seconds). I did not include the patch of bpo-26228 into the patch proposed here since I promised no change in behavior to the pty module. We now have a lot of test cases for the pty module and we are now finally aware that something is broken in the module. It's better to have failing tests than to have no tests at all :-)

I included the change to the documentation I proposed in bpo-29054.

The test suite does several things:

SmallPtyTests mainly tests the _copy loop and documents current behavior. As issue bpo-26228 has shown, this behavior is not what one would expect. When adding the patch for the mentioned issue, one also needs to adapt test__copy_eof_on_all and test__copy_eof_on_master. These tests will make it easier to resolve bpo-26228 since they provide a clear documentation of the current behavior and contribute the missing test cases.
PtyTest is left unchanged.
One has probably noticed that the current test_pty.py suite randomly fails. I introduced _os_read_exactly and _os_read_only to make all new tests deterministic and easy to debug.
PtyPosixIntegrationTest does a very simple integration test by spawning the programs true' andfalse'. Currently, these tests hang on the platforms where pty.spawn() is broken and testing cannot continue. As stated above, with the patch, everything is fine also on FreeBSD and OS X.
PtyMockingExecTestBase is a base class where I prepare to replace the actual exec-call made by pty.spawn() to only run local python code. This makes it possible to test pty.spawn() in a portable, platform-independent manner (where platform-independent means posix-compatible).
PtyWhiteBoxIntegrationPingPong1000Test exchanges a thousand Ping Pong Messages between master and the pty.spawn()ed slave.
PtyWhiteBoxIntegrationReadSlaveTest tests that we read all data from the slave. This test is inspired by msg283912 and it shows that my naive patch in the beginning of bpo-29054 would lose data.
PtyWhiteBoxIntegrationTermiosTest tests "advanced" features of terminals. The pty.spawn() function is the ultimate integration test setup for this.

Waiting for a review :-)

vadmium commented 7 years ago

Hi Cornelius and thanks for the work. Since the patch adds a significant amount of code, I think you might have to sign the contributor agreement: \http://www.python.org/psf/contrib/contrib-form/\. You can do it in a web browser.

I would like to review your tests, but because there is a lot of code to understand and I don’t have much time, it might take me a while. If you can find any way to simplify it, that would be great.

I have some comments on the code review, and will probably post more as I begin to understand what you are proposing.

It looks like you depend on fixing bpo-26228, but the patch there will conflict with your changes. Maybe merge with the other patch, or propose an alternative fix.

The documentation currently mentions the code is only tested on Linux, so it would be nice to update that.

The patch does introduce behavioural changes, if you consider abuse like monkey-patching os.exec() after importing the pty module. It is best not to make unnecessary changes in a bug fix.

Why does the patch slow the tests down so much? Ideally, it is nice to keep the tests as fast as possible.

What is the problem with using a genuine exec() call? Why do we need PtyMockingExecTestBase?

vadmium commented 7 years ago

Ignore my comment about contrib agreement, that must have come through recently :)

0d013f0e-0271-4f01-beb4-e3054bfc3188 commented 7 years ago

Thank you Martin very much for this very helpful review. I updated and simplified the tests and implemented your suggestions.

There are three open issues left.

1)

It looks like you depend on fixing bpo-26228, but the patch there will conflict with your changes. Maybe merge with the other patch, or propose an alternative fix.

I reverted my changes to pty.py, so the patches no longer conflict for this file. Currently, my test suite reveals the pty bug on FreeBSD but does not fix it. Since I'm not a FreeBSD expert and I don't want to steal Chris' patch [bpo-26228], I'd like to keep those issues separate. In other words, I'm proposing only a testsuite which introduces no behavioral changes and uncovers a bug.

This also means that this test suite is currently hanging on FreeBSD and OS X. The good news is, the patch in bpo-26228 fixes the problem.

The documentation currently mentions the code is only tested on Linux, so it would be nice to update that. Is it okay to keep this hanging around until the pty module is no longer broken on FreeBSD?

2)

Why does the patch slow the tests down so much? Ideally, it is nice to keep the tests as fast as possible.

The test suite does some heavy integration testing. The test in the class PtyWhiteBoxIntegrationReadSlaveTest are the slow ones. The test suite reads a large amount of data from the spawned child process. The amount of data was chosen to be a few kB in size to make sure that we won't lose data of the child and that --in case of a bug in the code-- we don't accidentally succeed due to race conditions.

Is 1-2 seconds really too slow? We can reduce the amount of data, but increase the risk that a future code change introduces a race condition which is not covered by the test suite.

3) About the system bell pretty printing test:

This seems platform-dependent. Do you really need to test echoing of control characters? We only need to test Python, not the operating system.

Including these integration tests has advantages and disadvantages.

Advantages:

Integration tests which deliberately depend on the careful interplay of many modules make sure that the overall system is healthy.
Modules such as termios or terminal-specific parts of os seem not to be tested currently. This test suite depends on their correct functionality and is thus a high-level test for them. In addition, pty.spawn() is the perfect place to test them.
The tests test actual terminal features. Without, we could not uncover if e.g., a pty master/slave is replaced by a simple pipe.
Tests are set up by increasing order of complexity w.r.t. the underlying modules they depend on. If something breaks in the future, these tests are designed to ease debugging and directly point to the error.
They provide some documentation about the magic of terminals and examples.
Those tests are fast.

Disadvantages:

Control-Character pretty printing only works on Linux.
Relies on the (POSIX-compliant) internals of the operating system.

I wanted to include some tests which depend on some terminal-specific features to make sure that the complete setup works and is operational. For example, the test suite now documents the difference between ptys and pipes and would complain if any code change breaks terminal-specific features. I chose control character pretty printing in some tests because this is one of the simplest features of terminals to test, debug, and verify. In contrast, for example, it would be extremely fragile and probably complete overkill to test things such as terminal delays.

My personal feeling says that the advantages outweigh the disadvantages. If you disagree, we can remove them :-)

vadmium commented 7 years ago

I would prefer to commit Chris’s fix for BSDs (bpo-26228) with a regression test. I can explain in the commit message who contributed to which part, or do two separate commits if you prefer. But the point is to add the test with the fix. I’m not going to commit the test on its own, because we know it will fail.

Even in Lib/test/test_pty.py, it looks like the two patches will collide. I was hoping you could combine the patches, or supply a set of patches that are tested and compatible. I don’t have OS X or BSD so I have to rely on you and the buildbots to confirm that a patch doesn’t break any tests.

I haven’t looked too closely regarding the slow tests, but another option is use the “cpu” resource to mark the test as being slow.

I realized that PtyWhiteBoxIntegrationTermiosTest is a reasonable test for the termios module, so it is beneficial. Though it may have been simpler to write it using pty.openpty(), avoiding an extra thread that just copies data between other threads.

0d013f0e-0271-4f01-beb4-e3054bfc3188 commented 7 years ago

Dear Martin, now I understand your intention. I merged my test suite with Chris's fix and documented our insights. SmallPtyTests contains regression tests for this issue.

While testing, I found a subtle change in behavior introduced by Chris's patch: It is no longer possible to get a broken pipe easily. I made a small adjustment to preserve this existing behavior: In the _copy loop, I changed the order in which master_fd and STDIN_FILENO are copied to preserve the BrokenPipeError.

I tuned the slow tests, the complete test suite now needs less than 1 second on my system. I updated the documentation to note that the module is no longer supposed to be Linux-only.

I realized that PtyWhiteBoxIntegrationTermiosTest is a reasonable test for the termios module, so it is beneficial. Though it may have been simpler to write it using pty.openpty(), avoiding an extra thread that just copies data between other threads.

I agree completely. However, pty.openpty() does not depend on the _copy() loop which is modified by Chris's patch. To add higher test coverage for the changes we introduce by merging Chris's fix, I decided to stick to pty.spawn().

I tested on Linux 4.4, Linux 3.13, FreeBSD 11, MacOS 10.11.6 and all tests are green. If anything goes wrong on the testbot army, here is the fallback strategy: [I don't expect anything to go wrong, but I'm rather prepared than sorry] If the test_pty does not finish within one minute on a system, please kill it. It would be nice to know on which platform it failed, so we can whitelist the module in the tests and the documentation for only the supported platforms.

vadmium commented 7 years ago

Can you explain your broken pipe situation? Are you talking about a real-world EPIPE operating on a pseudoterminal, or just a result of using a Unix socket to emulate a PTY in the tests? Usually a broken pipe is an asynchronous condition. You cannot predict exactly when it will happen without knowing the state of the other end, OS implementation, buffering, etc. It does not seem appropriate to change the _copy() loop around unless there is an real bug.

Regarding the buildbots, if I get this patch into a state that I am comfortable committing, the buildbots will are generally set to timeout in 15 or 20 minutes. See the “make buildbottest” commands in Makefile.pre.in, test.regrtest --timeout option, etc. You can also see which platforms have buildbots, and the state of them etc: \https://www.python.org/dev/buildbot/\.

I left a bunch of comments in the code review. There is a lot of useful code in there, but also a lot of stuff that is hard to follow or that I want to clean up.

490c593f-f636-409f-bb35-6abeb38a4595 commented 7 years ago

I have left some comments on Rietveld after a partial code review.

IMO for reasons of repeatability and easier maintenance [1], all the tests should be run with the pty set in a known mode, whether canonical or raw, and a test involving pty.spawn() should ensure that the chosen mode will be set by this function, or should skip it otherwise.

[1] For example when investigating the random failures in PtyTest reported by Cornelius in the patch comments, one has to wonder: "Oh, maybe it fails on this foreign system because the pty is not in raw mode there".

0d013f0e-0271-4f01-beb4-e3054bfc3188 commented 7 years ago

Uploaded a new version of the patch.

Changelog of this patch (compared to v3):

Fixed reliability issue of existing pty tests.
pty.fork() should now also work on systems without os.forkpty(). Added code to test this backup path of pty.fork().
Reverted my flawed changes to pty.py (broken pipe).
All new tests which produce output in the pty.spawn()ed child have the terminal in a well-defined state w.r.t. termios.
Subtle renaming: PtyTest is now PtyBasicTest. The reason is to execute it before the integration tests. All tests are independent and the order is irrelevant, but if something fails on a platform, it is easier to debug if we run the unit tests before we go to the integration tests.
Improved, cleaned, documented, and restructured test code. Thank you Xavier and Martin for your very helpful feedback :-)

I also added some small comments in the code review tool.

0d013f0e-0271-4f01-beb4-e3054bfc3188 commented 7 years ago

Thank you Martin for your comments in the code review tool. I prepared a new patch for the code review tool.

The github changelog from patch v4 (Feb 2017) to my HEAD (currently patch v5, Apr 2017) is at:

https://github.com/python/cpython/compare/master...diekmann:wip-issue-29070?expand=1

I hope that having several micro commits instead of one huge patch is helpful. In the end, it will be squashed anyway. Do you prefer to continue with bpo patches or have a github pull request?

d1deab11-b024-4eb0-9cd5-bf238273a617 commented 4 years ago

Hi! Can anyone please take a look at

https://bugs.python.org/issue41494 [ https://github.com/python/cpython/pull/21752 ]?

I think these are related. I wrote a new function called wspawn, which is like spawn+the following differences.

It sets window size at the beginning.
It registers a SIGWINCH handler.
It does not depend on OSError to return. It does a clean return instead.

python / cpython

Integration tests for pty.spawn on Linux and all other platforms #73256