Use orgtreesubpub to crawl org resources

iainelder commented 1 year ago

And delete its prototypes from this repo.

iainelder commented 2 months ago

Did some work on orgtreepubsub.

You can use it to implement aws-org-tree like this:

from orgtreepubsub import OrgCrawler
import topics
from boto3 import Session
from type_defs import Parent, Root, Org, Resource

from anytree import Node, RenderTree # pyright: ignore[reportMissingTypeStubs]
from anytree.search import find_by_attr # pyright: ignore[reportMissingTypeStubs]

tree = Node("Placeholder")

def add_root(crawler: OrgCrawler, resource: Root, org: Org) -> None:
    global tree
    tree = Node(name=resource.id, resource=resource)

def add_child(crawler: OrgCrawler, resource: Resource, parent: Parent) -> None:
    global tree
    parent_node: Node = find_by_attr(tree, parent.id)
    Node(name=resource.id, parent=parent_node, resource=resource)

topics.root.connect(add_root)
topics.orgunit.connect(add_child)
topics.account.connect(add_child)

OrgCrawler(Session()).crawl()

for pre, _, node in RenderTree(tree):
    print(f"{pre}{node.resource.name} ({node.name})")

iainelder commented 2 months ago

orgtreepubsub needs some more work before I can use it for aws-org-tree.

orgtreepubsub also publishes all the tags for all the objects.

In an arbitrary test org of about 150 accounts, average runtimes:

aws-org-tree 0.4.1: 15s
orgtreepubsub 3b91a3f: 25s
orgtreepubsub 3b91a3f without tag publishing: 6s

I need to make orgtreepubsub configurable so that aws-org-tree can tell it that it doesn't need the tags.

Then aws-org-tree will be faster.

iainelder commented 2 months ago

I added a set of signals per crawler instance. Now the crawler behavior is programmable so I can avoid fetching tags.

The code still needs to run in the orgtreepubsub repo because the packaging isn't set up property and I can't import the orgtreepubsub package. Probably I just need to add an __init__.py.

More changes needed for orgtreepubsub:

Define a Child type (because a Root is Resource but not a Child!)
Define a "containment" event distinct from a "parentage" event (because an organization isn't a real parent of the root. You don't have to reference the org to get the root. The org isn't a real resource in the sense that it doesn't have a name)
OR make the rendering more dynamic, falling back to the ID if the name doesn't exist

from orgtreepubsub import OrgCrawler
from type_defs import Parent, Child, Organization
from boto3 import Session

from anytree import Node, RenderTree  # pyright: ignore[reportMissingTypeStubs]
from anytree.search import find_by_attr  # pyright: ignore[reportMissingTypeStubs]

tree = Node("Placeholder")

def add_edge(crawler: OrgCrawler, child: Child, parent: Parent) -> None:
    global tree
    if tree.name == "Placeholder":
        tree = Node(name=parent.id, resource=parent)
        parent_node = tree
    else:
        parent_node: Node = find_by_attr(tree, parent.id)
    Node(name=child.id, parent=parent_node, resource=child)

crawler = OrgCrawler(Session())
crawler.init = crawler.publish_organization
crawler.on_organization.connect(OrgCrawler.publish_roots)
crawler.on_root.connect(OrgCrawler.publish_orgunits_under_resource)
crawler.on_root.connect(OrgCrawler.publish_accounts_under_resource)
crawler.on_orgunit.connect(OrgCrawler.publish_orgunits_under_resource)
crawler.on_orgunit.connect(OrgCrawler.publish_accounts_under_resource)

crawler.on_parentage.connect(add_edge)

crawler.crawl()

for pre, _, node in RenderTree(tree):
    if isinstance(node.resource, Organization):
        continue
    print(f"{pre}{node.resource.name} ({node.name})")

iainelder commented 2 months ago

See https://github.com/iainelder/orgtreepubsub/issues/8 for the packaging fix.

iainelder / aws-org-tree

Use orgtreesubpub to crawl org resources #3