labgrid-project / labgrid

Embedded systems control library for development, testing and installation
https://labgrid.readthedocs.io/
Other
334 stars 174 forks source link

Labgrid exporter getting stuck because udev socket is filing up #423

Closed Emantor closed 5 years ago

Emantor commented 5 years ago

The udev socket can fill up and raise a polling error if events are not handled fast enough:

May 06 09:29:35 rl10-srv labgrid-staging-exporter[1056]: Traceback (most recent call last):                                                         
May 06 09:29:35 rl10-srv labgrid-staging-exporter[1056]:   File "/usr/ptx-venvs/labgrid-staging/lib/python3.5/site-packages/labgrid/remote/exporter.py", line 405, in _poll_step
May 06 09:29:35 rl10-srv labgrid-staging-exporter[1056]:     changed = resource.poll()
May 06 09:29:35 rl10-srv labgrid-staging-exporter[1056]:   File "/usr/ptx-venvs/labgrid-staging/lib/python3.5/site-packages/labgrid/remote/exporter.py", line 88, in poll
May 06 09:29:35 rl10-srv labgrid-staging-exporter[1056]:     self.local.poll()
May 06 09:29:35 rl10-srv labgrid-staging-exporter[1056]:   File "/usr/ptx-venvs/labgrid-staging/lib/python3.5/site-packages/labgrid/resource/common.py", line 114, in poll
May 06 09:29:35 rl10-srv labgrid-staging-exporter[1056]:     self.manager.poll()
May 06 09:29:35 rl10-srv labgrid-staging-exporter[1056]:   File "/usr/ptx-venvs/labgrid-staging/lib/python3.5/site-packages/labgrid/resource/udev.py", line 33, in poll
May 06 09:29:35 rl10-srv labgrid-staging-exporter[1056]:     for device in iter(partial(self._monitor.poll, 0), None):                               May 06 09:29:35 rl10-srv labgrid-staging-exporter[1056]:   File "/usr/ptx-venvs/labgrid-staging/lib/python3.5/site-packages/pyudev/monitor.py", line 357, in poll
May 06 09:29:35 rl10-srv labgrid-staging-exporter[1056]:     if eintr_retry_call(poll.Poll.for_events((self, 'r')).poll, timeout):
May 06 09:29:35 rl10-srv labgrid-staging-exporter[1056]:   File "/usr/ptx-venvs/labgrid-staging/lib/python3.5/site-packages/pyudev/_util.py", line 163, in eintr_retry_call
May 06 09:29:35 rl10-srv labgrid-staging-exporter[1056]:     return func(*args, **kwargs) 
May 06 09:29:35 rl10-srv labgrid-staging-exporter[1056]:   File "/usr/ptx-venvs/labgrid-staging/lib/python3.5/site-packages/pyudev/_os/poll.py", line 97, in poll
May 06 09:29:35 rl10-srv labgrid-staging-exporter[1056]:     return list(self._parse_events(eintr_retry_call(self._notifier.poll, timeout)))         May 06 09:29:35 rl10-srv labgrid-staging-exporter[1056]:   File "/usr/ptx-venvs/labgrid-staging/lib/python3.5/site-packages/pyudev/_os/poll.py", line 112, in _parse_events
May 06 09:29:35 rl10-srv labgrid-staging-exporter[1056]:     raise IOError('Error while polling fd: {0!r}'.format(fd))
May 06 09:29:35 rl10-srv labgrid-staging-exporter[1056]: OSError: Error while polling fd: 7                                                         

see pyudev/pyudev#194

Proposed fix:

Bastian-Krause commented 5 years ago

Is there any reason to catch exceptions raised during polling? If not, we should drop the try..except in ExporterSession._poll_step(). From what I've seen this always leaves the resources exported in some undefined state.

It is very hard to discover the reason for this as a labgrid user (when not looking into the exporter logs), because the effects of this can look like a target behaving in strange ways.

jluebbe commented 5 years ago

This should be fixed with the merge of #419.