google / gcp_scanner

A comprehensive scanner for Google Cloud
Apache License 2.0
305 stars 94 forks source link

Refactor the "scanner.py" file to improve maintainability and organization #113

Closed grumpyp closed 1 year ago

grumpyp commented 1 year ago

Is your feature request related to a problem? Please describe. The scanner.py file has a function called crawl_loop which creates a dict which could get a bit messy and not nice to work with in the future.

I propose to introduce dataclasses for each project. We could work with base-classes from here on and add the other functionality on each class independently. The crawlers should also be more generic like proposed in #108.

The concept would also integrate with the propsed #108

From there on it would be way easier and clear future implementations would be straight forward.

Describe the solution you'd like All projects will be dataclasses.

# keep in mind this is only an example and not optimized yet, I want to showcase a cleaner implementation generelly
import gcp_scanner

class ScannerProcess():
  def __init__(self):
    projects = []

  @staticmethod
  def load(config: Dict) -> List[Crawler]:
    crawlers = []
    for crawl in config.keys():
      getattr(gcp_scanner, crawl)()

scanner = GpcScanner.load(config)
for project in projects:
  p = Project(name="project['projectId']")
  for scan in scanner: 
    p.scans.append(scan.load_data(p.project_number, credientials))

the if is_set(scan_config, 'service_accounts'): would be replaced by something like that:

# config.json
{
"compute_instances": {
    "fetch": true,
    "comment": "Fetch metadata about GCP VMs"
  },
  "compute_disks": {
    "fetch": true,
    "comment": "Fetch metadata about GCE disks"
  }
}
/gcp_scanner
  - __init__.py
# init.py includes the imports of all the crawler classes like
from gcp_scanner.crawler import GkeCluster
class Basecrawl(Abstract):
  def load_data():
    pass

class GkeCluster(Basecrawl):
  def load_data():
    #do the specific thing for the GkeCluster Crawler

I hope my thoughts are not too abstract now, but my concept would basically create objects and classes for the Crawlers.

I add a brief description of a AI to summarize the concept:

The problem being addressed is that the crawl_loop function in scanner.py creates a messy dictionary that is not easy to work with. The proposed solution is to introduce data classes for each project, which would make the code more organized and easier to work with in the future.

The proposed solution would involve creating a base class for all the crawlers, which would have a method called load_data(). Then, for each specific crawler (e.g., GkeCluster), a subclass would be created that would inherit from the base class and implement the load_data() method with the specific functionality for that crawler.

The crawl_loop function would be replaced by a function that loads the configuration from a JSON file, creates instances of the appropriate crawler classes based on the configuration, and then calls the load_data() method on each crawler instance to fetch the data for the project. The result of this process would be a list of projects, each with its own data.

The proposed solution would make the code more modular and easier to maintain in the future. By using data classes and inheritance, the code would be more organized and easier to extend with new functionality. Additionally, by using a configuration file to specify which crawlers to use, the code would be more flexible and adaptable to different use cases.

Am0stafa commented 1 year ago

yeah that's a nice approach and i added a similar more detailed solution in my proposal

Bhardwaj-Himanshu commented 1 year ago

This issue from the description looks like, is being alloted for google summer of code. But if I could help in contributing to it- -can someone please brief me out on what particularly the problem is asking, rather than gauging on complete solution. Thanks

grumpyp commented 1 year ago

Hi, it's my proposal for GSC, once it's started I'd appreciate you to work on it too. @Bhardwaj-Himanshu

Bhardwaj-Himanshu commented 1 year ago

We could start working on it, starting now as well!😂 Anyways, I wish you all the luck for your proposal and if you need any help in the future, ping me up!-->not sure if I could help though.

grumpyp commented 1 year ago

We could yes, but it's unclear what to do.

Bhardwaj-Himanshu commented 1 year ago

Let's try tweaking out with things here and there then. As I have no knowledge on this topic and can't feel any-more un-knowledgable after reading out the brief of the problem and solution.

Could you try explain the problem to me? and then the solution concept you are trying to implement (remember, just answer the above questions in just 2 lines, so that we can narrow it down from the broad view) Thanks