kedro-org / kedro-viz

Visualise your Kedro data and machine-learning pipelines and track your experiments.
https://demo.kedro.org
Apache License 2.0
681 stars 113 forks source link

CPU pinned at 100% #2135

Closed pascalwhoop closed 2 weeks ago

pascalwhoop commented 1 month ago

Description

Regularly observing 1 python process being pinned at 100% CPU utilization. Turns out it's kedro viz

$ ps ax | grep 71551                                                                                                                                                                                          4994ms
71551 s023  R+   319:19.17 /opt/homebrew/Cellar/python@3.11/3.11.9_1/Frameworks/Python.framework/Versions/3.11/Resources/Python.app/Contents/MacOS/Python -c from multiprocessing.spawn import spawn_main; spawn_main(tracker_fd=7, pipe_handle=9) --multiprocessing-fork
85432 s028  S+     0:00.00 grep --color=auto 71551
$ pstree -p 71551                          394ms
-+= 00001 root /sbin/launchd
 \-+= 00836 pascalwhoop /Applications/iTerm.app/Contents/MacOS/iTerm2
   \-+- 00901 pascalwhoop /Users/pascalwhoop/Library/Application Support/iTerm2/iTermServer-3.5.5 /Users/pascalwhoop/Library/Application Support/iTerm2/iterm2-daemon-1.socket
     \-+= 70530 root /usr/bin/login -fpl pascalwhoop /Applications/iTerm.app/Contents/MacOS/ShellLauncher --launch_shell SHELL=/opt/homebrew/bin/fish
       \-+= 70533 pascalwhoop -fish
         \-+= 71535 pascalwhoop /opt/homebrew/Cellar/python@3.11/3.11.9_1/Frameworks/Python.framework/Versions/3.11/Resources/Python.app/Contents/MacOS/Python /Users/pascalwhoop/Code/everycure/matrix/pipelines/matrix/.venv/bin/kedro viz --aut
           \-+- 71551 pascalwhoop /opt/homebrew/Cellar/python@3.11/3.11.9_1/Frameworks/Python.framework/Versions/3.11/Resources/Python.app/Contents/MacOS/Python -c from multiprocessing.spawn import spawn_main; spawn_main(tracker_fd=7, pipe_ha
             \--- 71563 pascalwhoop /opt/homebrew/Cellar/python@3.11/3.11.9_1/Frameworks/Python.framework/Versions/3.11/Resources/Python.app/Contents/MacOS/Python -c from multiprocessing.spawn import spawn_main; spawn_main(tracker_fd=7, pipe_

Steps to Reproduce

it may be related to our kedro pipeline, it's dynamic and has 180+ nodes. Happy to share some telemetry if you guide me how. I can't share the code (yet) unfortunately.

Your Environment

Include as many relevant details as possible about the environment you experienced the bug in:

Checklist

rashidakanchwala commented 1 month ago

Thanks, please do share telemetry for it.

rashidakanchwala commented 1 month ago

@ravi-kumar-pilla

pascalwhoop commented 1 month ago

@rashidakanchwala can you point me at a doc how to do that?

ravi-kumar-pilla commented 1 month ago

Hi @pascalwhoop , thanks for raising this. Could you please let us know -

  1. Which command did you use kedro viz, kedro viz run or any other ?
  2. Did you use any options while running kedro viz like kedro viz -a ?
  3. How did you stop the running kedro viz instance ?
  4. Were you using viz line magic %run_viz in jupyter notebook ?
  5. kedro viz version you are using.

Thank you

astrojuanlu commented 1 month ago

From @pascalwhoop top comment, I see the command seems to be kedro viz --autoreload:

         \-+= 71535 pascalwhoop /opt/homebrew/Cellar/python@3.11/3.11.9_1/Frameworks/Python.framework/Versions/3.11/Resources/Python.app/Contents/MacOS/Python /Users/pascalwhoop/Code/everycure/matrix/pipelines/matrix/.venv/bin/kedro viz --aut
rashidakanchwala commented 1 month ago

@rashidakanchwala can you point me at a doc how to do that?

Thanks, I just realised telemetry doesn't share that information. For Kedro-viz; it only shares on the way you interact with the UI. We have done some refactoring around autoreload recently. Once we release that maybe we might observe some changes to this - https://github.com/kedro-org/kedro-viz/pull/2134

astrojuanlu commented 1 month ago

Is it normal that multiprocessing.spawn.spawn_main is called twice @rashidakanchwala ?

pascalwhoop commented 1 month ago

I re-set up my mailing notifications and will hopefully be more responsive for OSS contributions going forward. If not, ping me on slack :)

kedro viz --autoreload indeed was used to kick off.

rashidakanchwala commented 1 month ago

Is it normal that multiprocessing.spawn.spawn_main is called twice @rashidakanchwala ?

this is not kedro-viz code, but watchgod library that we call when we do --autoreload @jitu5 and @ravi-kumar-pilla , can we test if replacing watchgod helps this ?

astrojuanlu commented 1 month ago

Then @pascalwhoop any chance you can try if #2134 fixes the issue?

ravi-kumar-pilla commented 1 month ago

For context on what we do for --autoreload : We use multiprocessing to run the uvicorn server in its own process (recommended) . For --autoreload we use watchgod which starts a subprocess for file watching. So in a way we have 2 processes running. We might need to change this approach.

pascalwhoop commented 1 month ago

we do have a lot of files in the folder, are you sensibly excluding files? E.g. only watch those that are managed by git (vs. all of .venv and data?

rashidakanchwala commented 1 month ago

Good question, you are right we watch all files where file_path.endswith((".yml", ".yaml", ".py", ".json")) -- maybe we need to add better logic to this.

ravi-kumar-pilla commented 1 month ago

Good question, you are right we watch all files where file_path.endswith((".yml", ".yaml", ".py", ".json")) -- maybe we need to add better logic to this.

Agree. We should also ignore hidden files and files not tracked by version control. @jitu5

jitu5 commented 1 month ago

Good question, you are right we watch all files where file_path.endswith((".yml", ".yaml", ".py", ".json")) -- maybe we need to add better logic to this.

Agree. We should also ignore hidden files and files not tracked by version control. @jitu5

Sure we can add this changes in https://github.com/kedro-org/kedro-viz/pull/2134

jitu5 commented 1 month ago

Right now we are using multiprocessing.Process but in watchfiles documentation suggest to use multiprocessing.get_context('spawn').Process to avoid forking and improve code reload/import. I will add this changes as well in https://github.com/kedro-org/kedro-viz/pull/2134

ravi-kumar-pilla commented 1 month ago

Right now we are using multiprocessing.Process but in watchfiles documentation suggest to use multiprocessing.get_context('spawn').Process to avoid forking and improve code reload/import. I will add this changes as well in #2134

Cool... we did that for notebook launcher while resolving an issue - https://github.com/kedro-org/kedro-viz/blob/main/package/kedro_viz/launchers/jupyter.py#L149

pascalwhoop commented 1 month ago

appreciate the active work folks that should do the trick

$ find ./ | egrep "(yml|yaml|py|json)\$" | wc -l                                                                                                                                                               
119393

(node we use joblib heavily to cache API calls which leads to such large number of json files)

rashidakanchwala commented 2 weeks ago

The fix for this issue has been merged. As a follow-up, we’ll remind @pascalwhoop to check if it’s resolved in the next release of Kedro-Viz. Closing this for now.