ucphhpc / migrid-sync

MiGrid workspace where master branch is kept strictly in sync with SF upstream svn repo. Any development or experiments should use a branch. You probably want to fork your own clone or work e.g. on the edge branch if you wish to contribute.
GNU General Public License v2.0
4 stars 4 forks source link

Apache reload/restart user process threads, which dumps entire file contents into apache error.log. #50

Open Bjarke42 opened 6 months ago

Bjarke42 commented 6 months ago

We are experiencing problems with apache reload or restarts user process threads with the result that it dumps the entire file contents into apache error.log. T

This is when a user is uploading via Web interface. In the mig.log i only see lots of activity on /wsgi-bin/uploadchunked.py with INFO type messages. I do not see any mentioning of the file being dumped into error.log in mig.log, nor do i see any mentioning of restart or alike in mig.log

The docker log does not contain any information about restarting anything.

We discovered this because of very high IO load on your migrid frontend server on its own disks. Everything related to migrid is normally accessed via network attached storage not locally.

The following is an example of error.log:

NameError: name 'AttributeError' is not defined Exception ignored in: <bound method FieldStorage.__del__ of FieldStorage(None, None, [MiniFieldStorage('path', '/'), MiniFieldStorage('output_format', 'json'), MiniFieldStorage('flags', 'fa'), MiniFieldStorage('_', '1714470641092')])> Traceback (most recent call last):
  File "/usr/lib64/python3.6/cgi.py", line 574, in __del__
NameError: name 'AttributeError' is not defined Exception ignored in: <bound method FieldStorage.__del__ of FieldStorage(None, None, [MiniFieldStorage('output_format', 'json'), MiniFieldStorage('action', 'put'), MiniFieldStorage('share_id', 'undefined'), MiniFieldStorage('_csrf', '80fa3717782c3f1ae37df3c44a04108b4ce45a002525d0be17f146111488e879'), FieldStorage('current_dir', None, './<directory>/'), FieldStorage('files[]', '<filename>', b'*?\t]F\xae\xa0

<very long binary data output into file>

Traceback (most recent call last):
  File "/usr/lib64/python3.6/cgi.py", line 574, in __del__
NameError: name 'AttributeError' is not defined Exception ignored in: <bound method FieldStorage.__del__ of FieldStorage(None, None, [MiniFieldStorage('path', '<directory>/'), MiniFieldStorage('output_format', 'json'), MiniFieldStorage('flags', 'fa'), MiniFieldStorage('_', '1714470641097')])> Traceback (most recent call last):
  File "/usr/lib64/python3.6/cgi.py", line 574, in __del__
NameError: name 'AttributeError' is not defined

@jonasbardino found the following apache mod-wsgi could be related problems: Python issue, django and mod_wsgi

jonasbardino commented 6 months ago

We have not been able to reproduce the problem so far, but from the mod_wsgi project owner comments in the last link I understand that it is most likely the python processes in mod_wsgi hitting an interpreter shutdown/restart while doing their final (lazy) session clean up after finishing a client request. I.e. that the error happens in native python interpreter tear-down and is in itself harmless - but of course it's still annoying with the log explosions and site load side effects.

He mentions that it happens e.g. when Apache is restarted or when the wsgi daemon process reaches the configured maximum requests and points to some configuration recommendations. AFAICT none of them are immediately applicable in our setup due to isolation and resource depletion considerations, but that may be something to investigate further.

jonasbardino commented 6 months ago

Just for the record one may refer to the Exception ignored in messages article for more background on those low-level python logs bubbling up from the standard library in this case.

I've added an explicit clean up at the end of the client operations in the wsgi application to try to prevent any delayed lazy clean up errors. Plus worked on redirecting+compacting any low-level wsgi errors into our own mig logs. Where that is not supported we simply close the wsgi log handle to keep the Apache error log brief. https://github.com/ucphhpc/migrid-sync/commit/27678d27dcaeb4a0fb7e7d9551e93c6a462796b1 https://github.com/ucphhpc/migrid-sync/commit/f1c3d82b69e80100fd24a192857dae4762597b38

According to my simple tests with injected exceptions it removes all actual python exception traceback contents in Apache error log, and it logs a brief version of the error in mig.log on python3 sites.

Please report back whether that solves the log explosion issues.