scipion-em / scipion-pyworkflow

Underlying pyworkflow module for the Scipion framework
GNU General Public License v3.0
5 stars 5 forks source link

memory leak? #116

Closed azazellochg closed 3 years ago

azazellochg commented 3 years ago

I'm running a relion 3d refinement:

00025:   mpiexec -mca orte_forward_job_control 1 --oversubscribe -n 3  `which relion_refine_mpi` --i Runs/014001_ProtRelionRefine3D/input_particles.star --particle_diameter 200 --ctf  --ctf_corrected_ref  --zero_mask  --ini_high 30.0 --sym D2 --ref_angpix 1.24453125 --ref Runs/014001_ProtRelionRefine3D/tmp/relion_class001.00.mrc --flatten_solvent  --norm  --scale  --o Runs/014001_ProtRelionRefine3D/extra/relion --oversampling 1 --pad 2 --auto_local_healpix_order 4 --healpix_order 2 --offset_range 5.0 --offset_step 2.0 --auto_refine  --split_random_halves  --low_resol_join_halves 40 --solvent_mask Runs/014001_ProtRelionRefine3D/tmp/mask.mrc --solvent_correct_fsc  --dont_combine_weights_via_disc  --pool 30 --gpu 0:1  --j 8

It's the only running protocol in the project. I switch on debug mode and see:

------------- refreshing ---------- 
  open files:  6
    - /cephfs/gsharov/ScipionUserData/projects/relion31_tutorial_scipion/project.sqlite, 5
    - /cephfs/gsharov/ScipionUserData/projects/relion31_tutorial_scipion/settings.sqlite, 6
    - /cephfs/gsharov/ScipionUserData/projects/relion31_tutorial_scipion/Runs/012980_ProtRelionClassify3D/volumes.sqlite, 8
    - /cephfs/gsharov/ScipionUserData/projects/relion31_tutorial_scipion/Runs/012980_ProtRelionClassify3D/volumes.sqlite, 13
    - /cephfs/gsharov/ScipionUserData/projects/relion31_tutorial_scipion/Runs/012980_ProtRelionClassify3D/volumes.sqlite, 14
    - /cephfs/gsharov/ScipionUserData/projects/relion31_tutorial_scipion/Runs/012980_ProtRelionClassify3D/volumes.sqlite, 15
  memory percent:  0.01586849244847583
------------- refreshing ---------- 
  open files:  4
    - /cephfs/gsharov/ScipionUserData/projects/relion31_tutorial_scipion/project.sqlite, 5
    - /cephfs/gsharov/ScipionUserData/projects/relion31_tutorial_scipion/settings.sqlite, 6
    - /cephfs/gsharov/ScipionUserData/projects/relion31_tutorial_scipion/Runs/012980_ProtRelionClassify3D/volumes.sqlite, 8
    - /cephfs/gsharov/ScipionUserData/projects/relion31_tutorial_scipion/Runs/012980_ProtRelionClassify3D/volumes.sqlite, 13
  memory percent:  0.01603428605785593
------------- refreshing ---------- 
  open files:  5
    - /cephfs/gsharov/ScipionUserData/projects/relion31_tutorial_scipion/project.sqlite, 5
    - /cephfs/gsharov/ScipionUserData/projects/relion31_tutorial_scipion/settings.sqlite, 6
    - /cephfs/gsharov/ScipionUserData/projects/relion31_tutorial_scipion/Runs/012980_ProtRelionClassify3D/volumes.sqlite, 8
    - /cephfs/gsharov/ScipionUserData/projects/relion31_tutorial_scipion/Runs/012980_ProtRelionClassify3D/volumes.sqlite, 13
    - /cephfs/gsharov/ScipionUserData/projects/relion31_tutorial_scipion/Runs/012980_ProtRelionClassify3D/volumes.sqlite, 14
  memory percent:  0.01603428605785593
------------- refreshing ---------- 
  open files:  4
    - /cephfs/gsharov/ScipionUserData/projects/relion31_tutorial_scipion/project.sqlite, 5
    - /cephfs/gsharov/ScipionUserData/projects/relion31_tutorial_scipion/settings.sqlite, 6
    - /cephfs/gsharov/ScipionUserData/projects/relion31_tutorial_scipion/Runs/012980_ProtRelionClassify3D/volumes.sqlite, 8
    - /cephfs/gsharov/ScipionUserData/projects/relion31_tutorial_scipion/Runs/012980_ProtRelionClassify3D/volumes.sqlite, 13
  memory percent:  0.01603428605785593
------------- refreshing ---------- 
  open files:  5
    - /cephfs/gsharov/ScipionUserData/projects/relion31_tutorial_scipion/project.sqlite, 5
    - /cephfs/gsharov/ScipionUserData/projects/relion31_tutorial_scipion/settings.sqlite, 6
    - /cephfs/gsharov/ScipionUserData/projects/relion31_tutorial_scipion/Runs/012980_ProtRelionClassify3D/volumes.sqlite, 8
    - /cephfs/gsharov/ScipionUserData/projects/relion31_tutorial_scipion/Runs/012980_ProtRelionClassify3D/volumes.sqlite, 13
    - /cephfs/gsharov/ScipionUserData/projects/relion31_tutorial_scipion/Runs/012980_ProtRelionClassify3D/volumes.sqlite, 14
  memory percent:  0.01603428605785593

Classify protocol is not used in any way by refinement, it's finished weeks ago. Why the files are opening and closing?

image

pconesa commented 3 years ago

wow! not idea...was it running and "active" when the auto refine war launched?

azazellochg commented 3 years ago

No, it was finished many weeks ago. I'll check again later this week.

azazellochg commented 3 years ago

I tried to move/rename/delete the sqlite file but as soon as I open scipion project it's immediately is opened. No clue why. Since this is not my machine can't do much until it's rebooted next time.

azazellochg commented 3 years ago

I'm reproducing it on my laptop: 1) I have run a test eman project 2) I open a project which has no protocols running 3) switch on debug. 4) hit Refresh the protocol tree

Besides project.sqlite and settings.sqlite I see an open sqlite for subtomograms.sqlite. I do not have that protocol box even selected so it's not updating summary or smth... very strange.

azazellochg commented 3 years ago

@delarosatrevin is this a normal behavior or I'm missing something?

delarosatrevin commented 3 years ago

It. is very strange! It is not. normal...in the past, we had this when summary opens the set and did not. close, so it keeps the file handler open.

pconesa commented 3 years ago

I'll find time to look at this

El lun., 3 ago. 2020 22:16, Jose Miguel de la Rosa Trevin < notifications@github.com> escribió:

It. is very strange! It is not. normal...in the past, we had this when summary opens the set and did not. close, so it keeps the file handler open.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/scipion-em/scipion-pyworkflow/issues/116#issuecomment-668220774, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAF7ZYLTKSQSOVO45XKTYGDR64LLHANCNFSM4PHT55DQ .

pconesa commented 3 years ago

I'm looking at this....and so far I've found issues only when stoping a protocol and during the same GUI session. All is gone after a GUI restart.

azazellochg commented 3 years ago

here is the project that I didn't touch for months: image Init model protocol is not even selected, so it's not a summary, right?

pconesa commented 3 years ago

I'll try a test with that protocol

pconesa commented 3 years ago

I think I'm getting closer... It could be related to inputs that are volumes, that point to items in setOfVolumes.

pconesa commented 3 years ago

I'm getting this:

  open files:  14
    - /home/pablo/ScipionUserData/projects/TestEmanInitialModelGroel/project.sqlite, 4
    - /home/pablo/ScipionUserData/projects/TestEmanInitialModelGroel/settings.sqlite, 5
    - /home/pablo/ScipionUserData/projects/TestEmanInitialModelGroel/Runs/000074_EmanProtInitModel/volumes.sqlite, 10
    - /home/pablo/ScipionUserData/projects/TestEmanInitialModelGroel/Runs/000074_EmanProtInitModel/extra/initial_models/model_00_01.hdf, 11
    - /home/pablo/ScipionUserData/projects/TestEmanInitialModelGroel/Runs/000074_EmanProtInitModel/extra/initial_models/model_00_02.hdf, 12
    - /home/pablo/ScipionUserData/projects/TestEmanInitialModelGroel/Runs/000074_EmanProtInitModel/extra/initial_models/model_00_03.hdf, 13
    - /home/pablo/ScipionUserData/projects/TestEmanInitialModelGroel/Runs/000074_EmanProtInitModel/extra/initial_models/model_00_04.hdf, 14
    - /home/pablo/ScipionUserData/projects/TestEmanInitialModelGroel/Runs/000074_EmanProtInitModel/extra/initial_models/model_00_05.hdf, 15
    - /home/pablo/ScipionUserData/projects/TestEmanInitialModelGroel/Runs/000074_EmanProtInitModel/extra/initial_models/model_00_06.hdf, 16
    - /home/pablo/ScipionUserData/projects/TestEmanInitialModelGroel/Runs/000074_EmanProtInitModel/extra/initial_models/model_00_07.hdf, 17
    - /home/pablo/ScipionUserData/projects/TestEmanInitialModelGroel/Runs/000074_EmanProtInitModel/extra/initial_models/model_00_08.hdf, 18
    - /home/pablo/ScipionUserData/projects/TestEmanInitialModelGroel/Runs/000074_EmanProtInitModel/extra/initial_models/model_00_09.hdf, 19
    - /home/pablo/ScipionUserData/projects/TestEmanInitialModelGroel/Runs/000074_EmanProtInitModel/extra/initial_models/model_00_10.hdf, 20
    - /home/pablo/ScipionUserData/projects/TestEmanInitialModelGroel/Runs/000074_EmanProtInitModel/volumes.sqlite, 21
  memory percent:  1.6814296058294014

Just by having this:

image

azazellochg commented 3 years ago

Interesting, indeed it seems to be always volumes..

pconesa commented 3 years ago

Ok, on one side, I think the binding does not close the image file when getDimensions is used, at least with hdf5 files:

scipion3 python
from pwem.emlib.image import ImageHandler
ih = ImageHandler()
ih.getDimensions("/home/pablo/ScipionUserData/projects/TestEmanInitialModelGroel/Runs/000074_EmanProtInitModel/extra/initial_models/model_00_10.hdf")
proc = psutil.Process(os.getpid())
proc.proc.open_files()

returns:

[popenfile(path='/home/pablo/ScipionUserData/projects/TestEmanInitialModelGroel/Runs/000074_EmanProtInitModel/extra/initial_models/model_00_10.hdf', fd=3, position=11200, mode='r', flags=32768)]

At least with .mrc, .map files this does not happens.

This happens when browsing for input and expanding a setOfVolumes. Probably hd5 files need special handling not properly done.

pconesa commented 3 years ago

A different case is with the sqlites (sets), I'm printing the stack trace when making the connection and I'm seeing this pattern several times:

COMMAND:  SELECT * FROM Classes; Runs/000074_EmanProtInitModel/volumes.sqlite
ARGUMENTS:  ()
  File "/usr/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/pablo/desarrollo/scipion-app/scipion/__main__.py", line 442, in <module>
    main()
  File "/home/pablo/desarrollo/scipion-app/scipion/__main__.py", line 256, in main
    openProject('last')
  File "/home/pablo/desarrollo/scipion-pyworkflow/pyworkflow/apps/pw_project.py", line 78, in openProject
    projWindow = ProjectWindow(projPath)
  File "/home/pablo/desarrollo/scipion-pyworkflow/pyworkflow/gui/project/project.py", line 131, in __init__
    self.switchView(VIEW_PROTOCOLS)
  File "/home/pablo/desarrollo/scipion-pyworkflow/pyworkflow/gui/project/base.py", line 144, in switchView
    self.viewWidget = self.viewFuncs[newView](self.footer, self)
  File "/home/pablo/desarrollo/scipion-pyworkflow/pyworkflow/gui/project/viewprotocols.py", line 632, in __init__
    c = self.createContent()
  File "/home/pablo/desarrollo/scipion-pyworkflow/pyworkflow/gui/project/viewprotocols.py", line 698, in createContent
    self.runsTree = self.createRunsTree(runsFrame)
  File "/home/pablo/desarrollo/scipion-pyworkflow/pyworkflow/gui/project/viewprotocols.py", line 1121, in createRunsTree
    t = pwgui.tree.BoundTree(parent, self.provider, style='List.Treeview')
  File "/home/pablo/desarrollo/scipion-pyworkflow/pyworkflow/gui/tree.py", line 271, in __init__
    self.setProvider(provider)
  File "/home/pablo/desarrollo/scipion-pyworkflow/pyworkflow/gui/tree.py", line 284, in setProvider
    self.update()
  File "/home/pablo/desarrollo/scipion-pyworkflow/pyworkflow/gui/tree.py", line 348, in update
    self._objects = self.provider.getObjects()
  File "/home/pablo/desarrollo/scipion-pyworkflow/pyworkflow/gui/tree.py", line 548, in getObjects
    checkPids=self._checkPids)
  File "/home/pablo/desarrollo/scipion-pyworkflow/pyworkflow/project/project.py", line 1318, in getRuns
    self._setProtocolMapper(r)
  File "/home/pablo/desarrollo/scipion-pyworkflow/pyworkflow/project/project.py", line 1280, in _setProtocolMapper
    self._setHostConfig(protocol)
  File "/home/pablo/desarrollo/scipion-pyworkflow/pyworkflow/project/project.py", line 1261, in _setHostConfig
    protocol.setHostConfig(hostConfig)
  File "/home/pablo/desarrollo/scipion-pyworkflow/pyworkflow/protocol/protocol.py", line 1570, in setHostConfig
    self.hostConfig = config
  File "/home/pablo/desarrollo/scipion-pyworkflow/pyworkflow/object.py", line 575, in __setattr__
    not self.__attrPointed(name, value) and value._objDoStore):
  File "/home/pablo/desarrollo/scipion-pyworkflow/pyworkflow/object.py", line 568, in __attrPointed
    if attr.get() is value:
  File "/home/pablo/desarrollo/scipion-pyworkflow/pyworkflow/object.py", line 852, in get
    value = value[int(p)]  # item case
  File "/home/pablo/desarrollo/scipion-pyworkflow/pyworkflow/object.py", line 1119, in __getitem__
    return self._getMapper().selectById(itemId)
  File "/home/pablo/desarrollo/scipion-pyworkflow/pyworkflow/object.py", line 1104, in _getMapper
    self.load()
  File "/home/pablo/desarrollo/scipion-pyworkflow/pyworkflow/object.py", line 1196, in load
    self._mapper = self._MapperClass(fn, self._loadClassesDict(), prefix, self._indexes)
  File "/home/pablo/desarrollo/scipion-pyworkflow/pyworkflow/mapper/sqlite.py", line 750, in __init__
    self.__loadObjDict()
  File "/home/pablo/desarrollo/scipion-pyworkflow/pyworkflow/mapper/sqlite.py", line 816, in __loadObjDict
    rows = self.db.getClassRows()
  File "/home/pablo/desarrollo/scipion-pyworkflow/pyworkflow/mapper/sqlite.py", line 1167, in getClassRows
    self.executeCommand(self.SELECT_CLASS)
  File "/home/pablo/desarrollo/scipion-pyworkflow/pyworkflow/mapper/sqlite_db.py", line 85, in _debugExecute
    traceback.print_stack()

Interestingly all this is triggered by setting the hostconfig to the protocol, which calls the settAttr with the tricky __attrPointed method being called.

More to come.

pconesa commented 3 years ago

Ok, the key is there, when setting attributes to protocols the load of sqlites could happen triggered by object.__attrPointed

I think this is being there for a long time and nothing new in this release. On the other hand, fixing this might improve loading time.

azazellochg commented 3 years ago

Nice! Could you tell where/how did you put print_stack in sqlite_db.py?

azazellochg commented 3 years ago

nevermind .figured it out :)