ratt-ru / CubiCal

A fast radio interferometric calibration suite.
GNU General Public License v2.0
18 stars 13 forks source link

memory logging takes forever #396

Closed o-smirnov closed 4 years ago

o-smirnov commented 4 years ago

Rerunning a test for https://github.com/ratt-ru/CubiCal/issues/388... with a time chunk of 2 and a freq chunk of 7500. Everything started off briskly, but slowed down to molasses towards the end

INFO      20:04:15 - ms_tile            [io] [11.7/100.4 11.1/95.7 7.9Gb]   updated FLAG_ROW column (3.39% rows flagged)
INFO      20:04:15 - gain_machine       [x03] [6.8/98.6 6.2/89.3 7.9Gb] D0T1316F1: interpolating G:gain from zoom-g-phase-gains-1-1588316462_sdp_l0-NGC1365-corr_avg.parmdb
INFO      20:04:16 - main               [io] [9.7/98.4 9.1/93.7 3.8Gb] loading tile 439/540
INFO      20:04:18 - ms_tile            [io] [9.7/98.4 9.1/93.7 3.8Gb] tile 439/540: reading MS rows 4487953~4498218
INFO      20:04:19 - data_handler       [io] [9.7/98.6 9.1/94.0 3.8Gb] reading DATA
INFO      20:04:24 - data_handler       [io] [9.7/98.5 9.1/93.8 4.5Gb] reading BITFLAG
INFO      20:04:32 - ms_tile            [io] [9.7/98.5 9.1/93.8 4.5Gb]   3.39% input visibilities flagged and/or deselected
INFO      20:04:34 - main               [io] [9.7/98.5 9.1/93.8 7.0Gb] I/O job(s) complete

CPUs are all red in top. Things move along, but at a snail's pace. Gdb shows the damn thing is mostly stuck in...

(gdb) py-bt
Traceback (most recent call first):
  <built-in method readall of _io.FileIO object at remote 0x7f3e9ab43678>
  <built-in method read of _io.BufferedReader object at remote 0x7f3e9f915eb8>
  File "/home/oms/.venv/cc/lib/python3.6/site-packages/psutil/_pslinux.py", line 1597, in _read_smaps_file
    return f.read().strip()
  File "/home/oms/.venv/cc/lib/python3.6/site-packages/psutil/_common.py", line 452, in wrapper
    return fun(self)
  File "/home/oms/.venv/cc/lib/python3.6/site-packages/psutil/_pslinux.py", line 1516, in wrapper
    return fun(self, *args, **kwargs)
  File "/home/oms/.venv/cc/lib/python3.6/site-packages/psutil/_pslinux.py", line 1779, in memory_full_info
    smaps_data = self._read_smaps_file()
  File "/home/oms/.venv/cc/lib/python3.6/site-packages/psutil/_pslinux.py", line 1516, in wrapper
    return fun(self, *args, **kwargs)
  File "/home/oms/.venv/cc/lib/python3.6/site-packages/psutil/__init__.py", line 1090, in memory_full_info
    return self._proc.memory_full_info()
  File "/home/oms/projects/CubiCal/cubical/tools/logger.py", line 241, in filter
    mis.append(p.memory_full_info())
  File "/usr/lib/python3.6/logging/__init__.py", line 720, in filter
    result = f.filter(record)
  File "/usr/lib/python3.6/logging/__init__.py", line 1453, in handle

Yes, psutils.memory_full_info()!!! Well the docs do warn it's slower than memory_info()... who knew we were talking about hours and hours slower!

o-smirnov commented 4 years ago

OK, timing it, I'm seeing regimes where memory_full_info() takes up to a second. No idea what causes this, but clearly some unexpected behaviour is afoot. And because this is on the log filter not the handler, this happens all the time (even with disabled log messages).

Will tweak as follows: