google / gcp_scanner

A comprehensive scanner for Google Cloud
Apache License 2.0
305 stars 95 forks source link

feat✨: add resource parallelization #269

Closed peb-peb closed 1 year ago

peb-peb commented 1 year ago

Description

@mshudrak @ZetaTwo

mshudrak commented 1 year ago

Tested today the change. I see quite strange behavior when I run the scanner. It basically hangs forever and can't start the enumeration

python3 -m gcp_scanner -g - -o res -l INFO -wc 1
<skip>
Inspecting project <project_name_here>
^CException ignored in atexit callback: <function _exit_function at 0x7f20baaa39a0>
Traceback (most recent call last):
  File "/usr/lib/python3.10/multiprocessing/util.py", line 360, in _exit_function
    _run_finalizers()
  File "/usr/lib/python3.10/multiprocessing/util.py", line 300, in _run_finalizers
    finalizer()
  File "/usr/lib/python3.10/multiprocessing/util.py", line 224, in __call__
    res = self._callback(*self._args, **self._kwargs)
  File "/usr/lib/python3.10/multiprocessing/queues.py", line 199, in _finalize_join
    thread.join()
  File "/usr/lib/python3.10/threading.py", line 1096, in join
    self._wait_for_tstate_lock()
  File "/usr/lib/python3.10/threading.py", line 1116, in _wait_for_tstate_lock
    if lock.acquire(block, timeout):
KeyboardInterrupt: 

You can reproduce it by removing res output folder and running the scanner without it. It seems for me like some error happens in the child process and the parent waits for it to exit forever.

mshudrak commented 1 year ago

Likely related: https://stackoverflow.com/questions/61492362/multiprocessing-pool-hangs-if-child-process-killed

peb-peb commented 1 year ago

Looking into it.

mshudrak commented 1 year ago

Important note: it is likely related to previous commit with process based parallelization not this one

peb-peb commented 1 year ago

I was not able to reproduce the hanging behavior, but got similar result if the path to the output doesn't exists. I get the following output (the program terminates after the inspecting <<project_name>>):

python3 scanner.py -l INFO -wc 8 -o res -k ../output/sakeys 
2023-08-25 18:33:47 - INFO - Retrieving credentials from ../output/sakeys/gcp-scanner-test-project-01-ba40a00eb1fe.json
2023-08-25 18:33:47 - INFO - >> current service account: test-sa-01@gcp-scanner-test-project-01.iam.gserviceaccount.com
2023-08-25 18:33:47 - INFO - Retrieving projects list
Inspecting project gcp-scanner-393515 for Impersonation
Inspecting project gcp-scanner-test-project-02 for Impersonation
Inspecting project gcp-scanner-test-project-01 for Impersonation
Inspecting project gcp-scanner-393515
Inspecting project gcp-scanner-test-project-02
Inspecting project gcp-scanner-test-project-01

The possible solution would be to exit the program if the folder doesn't exist. This way, we eliminate the assumption that the path to output is valid.

mshudrak commented 1 year ago

I think this issue persist for any other errors that happens in the child process. You could try to reproduce by injecting an error in one of the child processes. I did my test in a fresh venv and have this issue constantly.

mshudrak commented 1 year ago

I think we need to dig deeper in that problem in order to understand why it is happening given it is critical issue that block the scan.

mshudrak commented 1 year ago

There could be differences in how we execute it. I can't reproduce it with GCP SA key either but when I acquire creds from gcloud it is happening (-g -) parameter.

mshudrak commented 1 year ago

Pylint is complaining about the recent change. PTAL.

peb-peb commented 1 year ago

@mshudrak done with the changes :)