sixty-north / cosmic-ray

Mutation testing for Python
MIT License
556 stars 54 forks source link

Intermittent Errors When Terminating Workers #521

Closed FloatingSunfish closed 3 years ago

FloatingSunfish commented 3 years ago

After mutation testing finished without any issues and stopping all 8 workers, intermittent errors are raised during their termination. Note that not all errors get raised at once.

The workers were started via cr-http-workers mutation_test_config.toml . and (I assume) cloned a repository in a temp directory somewhere.

TypeError:

Traceback (most recent call last):
  File "c:\users\admin\appdata\local\programs\python\python39\lib\runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "c:\users\admin\appdata\local\programs\python\python39\lib\runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "C:\Users\Admin\AppData\Local\Programs\Python\Python39\Scripts\cr-http-workers.exe\__main__.py", line 7, in <module>
  File "c:\users\admin\appdata\local\programs\python\python39\lib\site-packages\click\core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "c:\users\admin\appdata\local\programs\python\python39\lib\site-packages\click\core.py", line 782, in main
    rv = self.invoke(ctx)
  File "c:\users\admin\appdata\local\programs\python\python39\lib\site-packages\click\core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "c:\users\admin\appdata\local\programs\python\python39\lib\site-packages\click\core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "c:\users\admin\appdata\local\programs\python\python39\lib\site-packages\cosmic_ray\tools\http_workers.py", line 59, in main
    asyncio.get_event_loop().run_until_complete(run(config_file, repo_url, location))
  File "c:\users\admin\appdata\local\programs\python\python39\lib\asyncio\base_events.py", line 642, in run_until_complete
    return future.result()
  File "c:\users\admin\appdata\local\programs\python\python39\lib\site-packages\cosmic_ray\tools\http_workers.py", line 50, in run
    await asyncio.gather([await proc.communicate() for proc in procs])
  File "c:\users\admin\appdata\local\programs\python\python39\lib\asyncio\tasks.py", line 824, in gather
    if arg not in arg_to_fut:
TypeError: unhashable type: 'list'

PermissionError:

Aborted!
Traceback (most recent call last):
  File "c:\users\admin\appdata\local\programs\python\python39\lib\shutil.py", line 620, in _rmtree_unsafe
    os.rmdir(path)
PermissionError: [WinError 32] The process cannot access the file because it is being used by another process: 'C:\\Users\\Admin\\AppData\\Local\\Temp\\tmpm27sinwm\\0'

RecursionError:

Traceback (most recent call last):
  File "c:\users\admin\appdata\local\programs\python\python39\lib\weakref.py", line 656, in _exitfunc
    f()
  File "c:\users\admin\appdata\local\programs\python\python39\lib\weakref.py", line 580, in __call__
    return info.func(*info.args, **(info.kwargs or {}))
  File "c:\users\admin\appdata\local\programs\python\python39\lib\tempfile.py", line 816, in _cleanup
    cls._rmtree(name)
  File "c:\users\admin\appdata\local\programs\python\python39\lib\tempfile.py", line 812, in _rmtree
    _shutil.rmtree(name, onerror=onerror)

  ...

  File "c:\users\admin\appdata\local\programs\python\python39\lib\shutil.py", line 740, in rmtree
    return _rmtree_unsafe(path, onerror)
  File "c:\users\admin\appdata\local\programs\python\python39\lib\shutil.py", line 622, in _rmtree_unsafe
    onerror(os.rmdir, path, sys.exc_info())
  File "c:\users\admin\appdata\local\programs\python\python39\lib\tempfile.py", line 804, in onerror
    cls._rmtree(path)
  File "c:\users\admin\appdata\local\programs\python\python39\lib\tempfile.py", line 812, in _rmtree
    _shutil.rmtree(name, onerror=onerror)
  File "c:\users\admin\appdata\local\programs\python\python39\lib\shutil.py", line 733, in rmtree
    if _rmtree_islink(path):
  File "c:\users\admin\appdata\local\programs\python\python39\lib\shutil.py", line 577, in _rmtree_islink
    st = os.lstat(path)
RecursionError: maximum recursion depth exceeded while calling a Python object

Here, lines 740, 622, 804, and 812 (in order) are the ones that get repeated the most in the very long stack trace.

The good news is that mutation testing itself does not seem to be impacted by this issue. However, I assume this issue affects worker cleanup.

@abingham Also, is C:\Users\YOUR_USERNAME\AppData\Local\Temp the directory where the workers temporarily clone the code they're testing?

abingham commented 3 years ago

That first one looks like it's because I'm using asyncio.gather incorrectly. I should have that patched up soon.

I'm not sure about the others. My hope is that they're a side-effect of the first and that fixing the first will fix them.

abingham commented 3 years ago

Version 8.3.4 should fix the first issue, and perhaps the rest. If things seem to work for you, go ahead and close this out. Thanks for reporting it!

FloatingSunfish commented 3 years ago

@abingham Will do! I'll keep this issue open for a while longer, but rest assured that I'll close it eventually if I don't encounter any further errors. I'll also let you know if I encounter any more.

Thanks! 🙂

FloatingSunfish commented 3 years ago

@abingham TypeError seems to have been resolved.

Unfortunately, PermissionError and RecursionError are still present.

Also, there is a new error:

ValueError:

Aborted!
Exception ignored in: <function _ProactorBasePipeTransport.__del__ at 0x00000157F529B0D0>
Traceback (most recent call last):
  File "c:\users\admin\appdata\local\programs\python\python39\lib\asyncio\proactor_events.py", line 115, in __del__
    _warn(f"unclosed transport {self!r}", ResourceWarning, source=self)
  File "c:\users\admin\appdata\local\programs\python\python39\lib\asyncio\proactor_events.py", line 79, in __repr__
    info.append(f'fd={self._sock.fileno()}')
  File "c:\users\admin\appdata\local\programs\python\python39\lib\asyncio\windows_utils.py", line 102, in fileno
    raise ValueError("I/O operation on closed pipe")
ValueError: I/O operation on closed pipe
abingham commented 3 years ago

The PermissionError may have something to do with git files. The workaround in that link looks reasonable.

Similarly, the RecursionError may be related to being in/under the temp directory when we try to delete it.

I think both of these problems are specific to windows. I've been able to replicate the recursion problem on windows, so hopefully I'll be able to fix it.

abingham commented 3 years ago

I think that 8.3.4 (building now) will fix some or all of the PermissionError and RecursionError problems. As far as I can tell, windows occasionally has problems deleting git repositories. The "fix" is to ignore these issue when they arise. Let me know if this seems to fix things for you.

FloatingSunfish commented 3 years ago

Sure thing, @abingham. I'll be sure to try things out as I go. Although I have to work on other items so I'll be putting mutation testing to the side for the time being. I'll update you if I encounter any more issues though. Thanks!

FloatingSunfish commented 3 years ago

@abingham On my latest run, I only got PermissionError and RecursionError.

PermissionError (WinError 5 and 32):

Traceback (most recent call last):
  File "c:\users\admin\appdata\local\programs\python\python39\lib\tempfile.py", line 801, in onerror
    _os.unlink(path)
PermissionError: [WinError 5] Access is denied: 'C:\\Users\\Admin\\AppData\\Local\\Temp\\tmp7fqb5ybw\\0'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "c:\users\admin\appdata\local\programs\python\python39\lib\shutil.py", line 620, in _rmtree_unsafe
    os.rmdir(path)
PermissionError: [WinError 32] The process cannot access the file because it is being used by another process: 'C:\\Users\\Admin\\AppData\\Local\\Temp\\tmp7fqb5ybw\\0'

The RecursionError pretty much stayed the same.

I'll report back in case any more errors pop up. Thanks!

abingham commented 3 years ago

Is that the complete trace you’re getting? If not, I’m curious to know the most recent line from cosmic-ray’s source that’s in it. I don’t see how you can get to the code you’re getting to in that trace, so I’m baffled right now.

Just to double-check: you’re using version 8.3.4 (or the head of master), right?

FloatingSunfish commented 3 years ago

@abingham >Is that the complete trace you’re getting? Oh, no. That's just a sample of the many repeated PermissionErrors that I got. I'll be sure to send a complete stack trace to you soon. I'll just run everything again.

>you’re using version 8.3.4 (or the head of master), right? I simply uninstalled cosmic-ray and then reinstalled it to get what I assume is the latest version, which is 8.3.4 from what I saw.

FloatingSunfish commented 3 years ago

@abingham Attached is the full stack trace after pressing Ctrl+C once on the Worker window.

I assume it's supposed to say "Aborted!" once you start terminating workers.

Also, it looks like only PermissionError and RecursionError remain and the other errors have been resolved.

stack trace.txt

abingham commented 3 years ago

Sorry, it was my mistake; 8.3.4 didn't include the a change I thought it did. Try using 8.3.5.

The crux of the problem is that tempfile.TemporaryDirectory uses a weakref to try to ensure cleanup, and there doesn't seem to be a way to let it fail gracefully. So I've had to replace it with mkdtemp for which I can ignore the cleanup errors you're seeing. 8.3.5 seems to work correctly on my Windows system, so hopefully it does for you, too.

FloatingSunfish commented 3 years ago

@abingham Alright, I'll check out 8.3.5 and let you know how it goes. Thanks! 🙂

FloatingSunfish commented 3 years ago

Looks like all errors have been resolved in the 8.3.5 update.

Thanks! 😄