Some wpt integration tests being skipped because port in use

gsnedders commented 6 years ago

From https://github.com/w3c/web-platform-tests/pull/9480#issuecomment-365787562.

The master branch, in commit 040654c78f3f021512a1f22b9b3b99925bd52d37 (from #8614), has ./wpt run chrome totally broken (it always throws KeyError). This shouldn't have managed to land, because we have a test, test_run_chrome in tools/wpt/tests/test_wpt.py that should've caught this.

This issue has two sides to it:

Check that test_run_chrome passes with that commit checked out.
If it fails, figure out why it was skipped on Travis on that PR and fix the cause.
If it passes (more likely), improve the test and/or wpt run (given it presumably exited with 0 despite throwing an exception) to better check that it works. Note that we can't rely on Chrome having specific results, but we can probably check that we have some results (replacing the test with something in /infrastructure/ might make sense, given they're less likely to change).

vinicius0197 commented 6 years ago

I've tried running test_run_chrome with pytest -k chrome wpt/tests/test_wpt.py, first with the commit 040654c78f3f021512a1f22b9b3b99925bd52d37 checked out. That gave me the following error:

================================================================================= FAILURES =================================================================================
_____________________________________________________________________________ test_run_chrome ______________________________________________________________________________

manifest_dir = '/tmp/tmpC547g1'

    @pytest.mark.slow
    @pytest.mark.xfail(sys.platform == "win32",
                       reason="Tests currently don't work on Windows for path reasons")
    def test_run_chrome(manifest_dir):
        if is_port_8000_in_use():
            pytest.skip("port 8000 already in use")

        with pytest.raises(SystemExit) as excinfo:
            wpt.main(argv=["run", "--yes", "--no-pause", "--binary-arg", "headless",
                           "--metadata", manifest_dir,
>                          "chrome", "/dom/nodes/Element-tagName.html"])

wpt/tests/test_wpt.py:104: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
wpt/wpt.py:132: in main
    rv = script(*args, **kwargs)
wpt/run.py:419: in run
    rv = run_single(venv, **kwargs) > 0
wpt/run.py:426: in run_single
    return wptrunner.start(**kwargs)
wptrunner/wptrunner/wptrunner.py:295: in start
    return not run_tests(**kwargs)
wptrunner/wptrunner/wptrunner.py:185: in run_tests
    env_extras) as test_environment:
wptrunner/wptrunner/environment.py:96: in __enter__
    self.config = self.load_config()
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <wptrunner.environment.TestEnvironment object at 0x7f5a19d9fad0>

    def load_config(self):
        default_config_path = os.path.join(serve_path(self.test_paths), "config.default.json")
        local_config_path = os.path.join(here, "config.json")

        with open(default_config_path) as f:
            default_config = json.load(f)

        with open(local_config_path) as f:
            data = f.read()
>           local_config = json.loads(data % self.options)
E           KeyError: 'host'

wptrunner/wptrunner/environment.py:132: KeyError
--------------------------------------------------------------------------- Captured stdout call ---------------------------------------------------------------------------
Downloading chromedriver
 0:03.14 INFO Updating test manifest /tmp/tmpC547g1/MANIFEST.json
 0:03.15 INFO STDOUT: INFO:manifest:Skipping manifest download because existing file is recent
 0:07.14 INFO STDOUT: INFO:manifest:Updating manifest
 0:14.27 INFO STDOUT: DEBUG:manifest:Opening manifest at /tmp/tmpC547g1/MANIFEST.json
 0:18.69 INFO Using 1 client processes
 0:18.74 INFO Closing logging queue
 0:18.74 INFO queue closed
============================================================================= warnings summary =============================================================================
None
  pytest-catchlog plugin has been merged into the core, please remove it from your requirements.

-- Docs: http://doc.pytest.org/en/latest/warnings.html
============================================================================ 9 tests deselected ============================================================================
====================================================== 1 failed, 1 passed, 9 deselected, 1 warnings in 32.21 seconds =======================================================

Tried again with another commit 28ceecaaa6f0ff2cae0c586c54a923a1246393c3 checked out and received no error messages.

Running the test with pytest wpt/tests/test_wpt.py I received no error messages as well, but test_run_chrome is skipped with the message: port 8000 already in use. So the problem is probably with the way tests are skipped if there's a server running on that port.

I will try to work on a solution for this bug. @jgraham suggested adding a loop in the check with a sleep between checks.

vinicius0197 commented 6 years ago

Adding a loop with a sleep between checks didn't work. test_run_chrome keeps being skipped. I've put this test into a separate file inside tools/wpt/tests and this way it did catch chrome breakage. Maybe the other tests inside tools/wpt/tests/test_wpt.py are interfering with it?

gsnedders commented 6 years ago

If it's getting skipped because port 8000 is already in use then presumably something has gone wrong elsewhere, given it should be free. Maybe we should also add a check that if the TRAVIS environment variable is set then we should never skip…

vinicius0197 commented 6 years ago

@gsnedders hm, so the test will only be skipped if both TRAVIS environment variable is not set and port 8000 is in use?

     if os.environ.get('TRAVIS') == "false" and is_port_8000_in_use == "true":
         pytest.skip("TRAVIS environ variable not set")

Something like this, maybe?

I'm not sure yet about what is causing the problem on port 8000, though.

gsnedders commented 6 years ago

@vinicius0197 yeah, something like that. (well, maybe more like os.environ.get("TRAVIS", "false") == "false" so it actually works when it's not set. :))

vinicius0197 commented 6 years ago

@gsnedders Ok. I've noticed that test_serve is also being skipped with the same message (port 8000 already in use). So we should probably fix that as well with the same check, I guess?

As for why port 8000 is causing problems, should a new issue be opened for that matter (since it looks like it's affecting not only test_run_chrome)? If so, I can submit a PR to this one with the changes I've made.

gsnedders commented 6 years ago

@vinicius0197 We should generalise this issue. :)

gsnedders commented 6 years ago

@vinicius0197 did you find anything about the cause of port 8000 being in use?

vinicius0197 commented 6 years ago

@gsnedders sorry for the late reply! Didn't had time this last week to look into this, but I will be doing this tomorrow and will update this thread.

vinicius0197 commented 6 years ago

@gsnedders hm, for some reason the tests aren't being skipped in pytest anymore.

That's odd. You've mentioned that this may be a problem with the test being skipped on Travis (since the test iteself, as it's written, fails). I will take a look into that.

vinicius0197 commented 6 years ago

Two tests were skipped on the log of commit 040654c78f3f021512a1f22b9b3b99925bd52d37 on Travis. I suspect one of those was test_run_chrome.

Is it possible to make Travis output a more precise log of which tests actually run? I think that would help detecting where the problem really is.

jgraham commented 6 years ago

You can edit the pytest options in the tox.ini file. If you pass in --verbose and -s you should get a lot more helpful output.

vinicius0197 commented 6 years ago

Running pytest locally with--verbose and -sI get the following output: https://pastebin.mozilla.org/9079374. It's easy to see that test_run_chrome was skipped (line 132). But, when I run without --verbose and -s, the test runs normally and I get an error (as expected). This is really weird.

To test Travis, I've created a new branch on my fork of web-platforms-tests with commit 040654c78f3f021512a1f22b9b3b99925bd52d37 checked out. But when the tests are running I get the following output: The command "./tools/ci/run.sh" exited with 0. The Travis log is here. I don't know what can be causing this issue (I think I've configured everything correctly, at least). Am I doing something wrong here?

I want to set up this Travis build correctly and try to isolate the root of this problem. Right now it looks like something is odd with pytest.

gsnedders commented 6 years ago

@vinicius0197 add RUN_JOB=1 to the env section of the job in .travis.yml; by default only jobs with changes get run.

vinicius0197 commented 6 years ago

@gsnedders ok. I will try again later today

vinicius0197 commented 6 years ago

@gsnedders hm, still getting the same message: https://travis-ci.org/vinicius0197/web-platform-tests/jobs/352635350

Very newbie using Travis, so I'm running into some issues in there.

jgraham commented 6 years ago

It looks like something is failing in the tools/ci/before_install.sh script; perhaps the git submodule update because origin/master isn't a valid ref. I think the only thing in that script that might be significant in your case is starting xvfb so you could replace the body of that script with

    export DISPLAY=:99.0
    sh -e /etc/init.d/xvfb start 1>&2

and see if that improves things?

vinicius0197 commented 6 years ago

@jgraham Yes, it worked! At least it builds now.

I'm receiving the following output:

============================= test session starts ==============================
platform linux2 -- Python 2.7.14, pytest-3.4.2, py-1.5.2, pluggy-0.6.0
rootdir: /home/travis/build/vinicius0197/web-platform-tests/tools, inifile: pytest.ini
plugins: cov-2.5.1, mozlog-3.7, hypothesis-3.49.1
collected 12 items                                                             
tests/test_stability.py .                                                [  8%]
tests/test_wpt.py ..
No output has been received in the last 10m0s, this potentially indicates a stalled build or something wrong with the build itself.
Check the details on how to adjust your build configuration on: https://docs.travis-ci.com/user/common-build-problems/#Build-times-out-because-no-output-was-received
The build has been terminated

I've added--verboseto thetox.ini file according to this page. I will investigate the cause of this issue tomorrow.

gsnedders commented 6 years ago

@vinicius0197 I feel like this is ever less deserving of the "good first issue" label, but also yay! :)

vinicius0197 commented 6 years ago

@gsnedders it's a bit hard to debug what's going on with Travis. I tried adding a new environment variable for pytest to show debug info, but same things as above happens: https://travis-ci.org/vinicius0197/web-platform-tests/jobs/354897859

So I tried following the hint it gives and added a new line to .travis.yml: install: travis_wait mvn install

So it would wait a little more before escaping the build. But for some reason this command makes mvn exits with 1, as in: https://travis-ci.org/vinicius0197/web-platform-tests/jobs/354900881

Any ideas on what might be causing this problem with mvn?

gsnedders commented 6 years ago

That hint is Java specific; I've seen us randomly time out there before in the previous build, maybe just try retrying that. We should probably just let test_wpt hit the remote server and deal with the potential flakes (given we're already reliant on GitHub to download the repo in the first place).

vinicius0197 commented 6 years ago

@gsnedders hm, it keeps giving the same error (after retrying).

Could you clarify a little bit on what test_wpt should do (about hitting the remote server and dealing with flakes)? Not sure if I really understood that.

gsnedders commented 6 years ago

@vinicius0197

Could you clarify a little bit on what test_wpt should do (about hitting the remote server and dealing with flakes)? Not sure if I really understood that.

Remove --no-download from test_wpt.py.

vinicius0197 commented 6 years ago

The build is now throwing a lot more helpful output. Travis is catching two errors (in test_run_firefox and test_run_chrome), but it's a different error than the original KeyError.

https://travis-ci.org/vinicius0197/web-platform-tests/jobs/357075576

It's getting the following error:

ERROR: InvocationError: '/home/travis/build/vinicius0197/web-platform-tests/tools/wpt/.tox/py27/bin/pytest --cov'

vinicius0197 commented 6 years ago

I've tested running Travis with other commits checked out and it was fine. But it gives problems when I'm using:

[pytest]
addopts = -v -s

On pytest.ini file, as in this build (with a recent commit checked out): https://travis-ci.org/vinicius0197/web-platform-tests/jobs/357204501

Without the verbose option, Travis takes too long running test_wpt (without showing output) and times out.

vinicius0197 commented 6 years ago

Ok, seems like Travis is catching the error now: https://travis-ci.org/vinicius0197/web-platform-tests/jobs/357629969#L4550

vinicius0197 commented 6 years ago

@gsnedders Sorry for not updating this those last days (I've been busy). Seems like, apart from adding a simple export RUN_JOB1=1 and travis_wait, there's not really other differences between the original Travis log and my test log.

Do you think the issue could be related to any of those commands?

sahil107 commented 6 years ago

8614

foolip commented 6 years ago

This reminds me of https://github.com/web-platform-tests/wpt/issues/7928, where port 8000 being used also resulting in misery. @gsnedders, do these two bottom out in the same code so that it's a dupe, you think?

gsnedders commented 6 years ago

No, because here we have in principle shut the previous server down.

foolip commented 6 years ago

Included in Ecosystem Infra 2018 Q4 OKRs

mdittmer commented 5 years ago

Ping from your friendly neighbourhood ecosystem infra rotation

Any updates on this, @gsnedders?

gsnedders commented 5 years ago

Not been worked on.

foolip commented 5 years ago

@gsnedders any chance of getting this fixed before the end of the quarter?

gsnedders commented 5 years ago

Given I've not even started looking into this again, no.

gsnedders commented 4 years ago

We still get more failures with something else running on port 8000, FWIW.

web-platform-tests / wpt

Some wpt integration tests being skipped because port in use #9546