Closed peb-peb closed 1 year ago
I'm running into the following issue for quite some time when trying to run and test the following code, I'm getting the following error :point_down: :(
Hmm, I was unable to reproduce this problem. Running it on python 3.9.2. However, let's take a step back and think what we actually need to make asynchronous. I don't think that we can do it for project_list and project_info queries since it is required for the functions later on in the scanner loop to fetch specific resources. I made the change in my code and so far I see the following issue:
2023-07-05 21:24:26 - INFO - Retrieving Compute Snapshots
2023-07-05 21:24:26 - INFO - Failed to get compute snapshots in the test-gcp-scanner-2
2023-07-05 21:24:26 - INFO - (<class 'TypeError'>, TypeError("object dict can't be used in 'await' expression"), <traceback object at 0x7f1302bcf0c0>)
2023-07-05 21:24:26 - INFO - Retrieving Subnets
2023-07-05 21:24:27 - INFO - Failed to get subnets in the test-gcp-scanner-2
2023-07-05 21:24:27 - INFO - (<class 'TypeError'>, TypeError("object dict can't be used in 'await' expression"), <traceback object at 0x7f1302bb0180>)
2023-07-05 21:24:27 - INFO - Retrieving Firewall Rules
2023-07-05 21:24:27 - INFO - Failed to get firewall rules in the test-gcp-scanner-2
2023-07-05 21:24:27 - INFO - (<class 'TypeError'>, TypeError("object dict can't be used in 'await' expression"), <traceback object at 0x7f13031fbb00>)
2023-07-05 21:24:27 - INFO - Retrieving app services
2023-07-05 21:24:27 - INFO - Failed to retrieve App services for project test-gcp-scanner-2
2023-07-05 21:24:27 - INFO - (<class 'TypeError'>, TypeError("object dict can't be used in 'await' expression"), <traceback object at 0x7f1302614a40>)
2023-07-05 21:24:27 - INFO - Retrieving GCS Buckets
Traceback (most recent call last):
File "/usr/lib/python3.9/runpy.py", line 197, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/lib/python3.9/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/home/mshudrak/.local/lib/python3.9/site-packages/gcp_scanner/__main__.py", line 22, in <module>
scanner.main()
File "/home/mshudrak/.local/lib/python3.9/site-packages/gcp_scanner/scanner.py", line 590, in main
asyncio.run(crawl_loop(sa_tuples, args.output, scan_config, args.light_scan,
File "/usr/lib/python3.9/asyncio/runners.py", line 44, in run
return loop.run_until_complete(main)
File "/usr/lib/python3.9/asyncio/base_events.py", line 642, in run_until_complete
return future.result()
File "/home/mshudrak/.local/lib/python3.9/site-packages/gcp_scanner/scanner.py", line 293, in crawl_loop
project_result['storage_buckets'] = await CrawlerFactory.create_crawler(
File "/home/mshudrak/.local/lib/python3.9/site-packages/gcp_scanner/crawler/storage_buckets_crawler.py", line 49, in crawl
response = await request.execute()
TypeError: object dict can't be used in 'await' expression
It is basically happening for all the calls to specific resources.
I don't think that we can do it for project_list and project_info queries since it is required for the functions later on in the scanner loop to fetch specific resources.
I'll try it and update on the progress.
I am far from expert in asyncio but it seems like request.execute() is not a coroutine and it is complaining if I start second loop there. One solution could be to use nest_asyncio
but so far I have trouble making it work.
Are you sure you are familiar enough with asyncio to implement this? We are totally fine if you decide to go with classic python multithreading or multiprocessing...
I'll give it a last shot and then switch over to it...
I have implemented asyncio
. Things that had to be done:
await
themFor Example: according to this, the below code should return the output in 2sec, which it does:
But, when I try to apply the same to our tool, it still requests in the same way and consumes the same amount of time. So, the solutions would be:
asyncio.Future
or create a decorator to await itWhat should be the next steps? @mshudrak @ZetaTwo
Some related discussion on this similar issues on google-api-python-client
:
I'd go for multiprocessing ThreadPool for GCP resource requests and I'd do actual multiprocessing for project base parallelism.
I'd go for multiprocessing ThreadPool for GCP resource requests and I'd do actual multiprocessing for project base parallelism.
ok :+1:
I'll close this and send a new draft PR with the required changes.
Description
This issue proposes to make the main scan loop asynchronous. This will allow the scanner to continue scanning other resources while it is waiting for the results of a long-running operation. This will improve the performance of the scanner and make it more responsive to user input.
The scan loop would call the crawlers asynchronously.
Changes
request.execute()
Checklist
Additional Notes
I believe that the changes made to the main scan loop will improve the performance of the scanner.
After #228, we can use the
asyncio.gather()
to achieve the similar task.