nornir-automation / nornir

Pluggable multi-threaded framework with inventory management to help operate collections of devices
https://nornir.readthedocs.io/
Apache License 2.0
1.35k stars 226 forks source link

Async architecture (inventory) #880

Open ktbyers opened 7 months ago

ktbyers commented 7 months ago

This issue is async inventory related only, see:

https://github.com/nornir-automation/nornir/issues/881

For additional async related topics and links to separate issues on these topics.

ktbyers commented 7 months ago

@dgarros @ogenstad

ktbyers commented 7 months ago

Some initial work/thoughts from dbarroso. Note, some of this was from the early days of asyncio so some patterns may or may not be relevant

https://github.com/nornir-automation/nornir/pull/390 https://github.com/nornir-automation/nornir/pull/660

ktbyers commented 7 months ago

I think we should start with a discussion regarding Inventory and async and what this means/how it works?

Currently, inventory is single-threaded sync (it is outside the context of "runners"/before runners).

So if we have, for example, a NetBox system with 1000 devices and wanted an async inventory plugin how would this operate? I picked NetBox as there is a NetBox Nornir-plugin and I have a working NetBox system to test against (but it could be any remote HTTP API inventory system).

Current inventory load behavior is this:

    inventory_plugin = InventoryPluginRegister.get_plugin(config.inventory.plugin)
    inv = inventory_plugin(**config.inventory.options).load()

    # then basically call a transform function (on the hosts if one has been defined)
dbarrosop commented 7 months ago

discussion regarding Inventory and async

I think the first question I'd raise here is; is it really necessary? do we actually need an async inventory for any particular reason or is it just to say we support async inventories? Asking because the inventory is a one-shot thing you run when you initialize it, afterwards you mostly work with already loaded data. I'd understand async tasks to incrementally add information (i.e. a sync inventory that loads the hosts/groups and metadata but interfaces' data, BGP information, etc. is loaded afterwards via async tasks) but not sure I understand the use case for an async inventory.

ktbyers commented 7 months ago

I would probably generalize the question and ask, "do we need concurrency in inventory loading" (whether async vs threaded).

In other words, for large inventories being loaded via API what are the load times and what kind of improvements can be made via using a concurrent solution (and how much complexity does that entail).

ubaumann commented 7 months ago

I could imagine that the API and the rate limit are the bottleneck. Or we are loading too much data.

I would appreciate it if an inventory plugin would have an option for a minimal import. (Only host name and some needed properties but not all interfaces, for example.) And if more information is required it could be added to a task. Of course, we have to ensure the threads don't write the same inventory properties, but if we do it by host, it should be fine.

I think it would be beneficial to investigate where the bottleneck is.

dbarrosop commented 7 months ago

if an inventory plugin would have an option for a minimal import ... And if more information is required it could be added to a task

100% that, that's why I was suggesting before that what makes more sense is leaving the inventory as it is and rely on "inventory tasks" (which could be async) to load workflow-specific data.

IMO, given there is no clear use-case in mind, I'd suggest tabling this particular topic (async inventory) until something more concrete arises.

P.S: To keep discussions a bit easier to track I'd also suggest to create different issues for each one of your bulletpoints, otherwise this might get messy real quick :P

ktbyers commented 7 months ago

@dbarrosop @ubaumann So just to make sure I am understanding what you are both saying--a better pattern would be to have very simplified inventory plugins (minimal set of information needed to connect to the device).

And then subsequent Nornir tasks which populate additional information in hosts/groups for the inventory information that you need. These subsequent Nornir tasks could then be threaded or async (assuming we build an async-runner).

Is that a correct summary?

dbarrosop commented 7 months ago

Yes, exactly. Use the inventory plugin to download your devices, groups and metadata (so you can group them and filter them) and then rely on tasks within your workflows to download the specific data you need for that specific workflow. In my experience it is the best way of managing a large network with lots of configuration data. Otherwise you end up with a humungous inventory that takes ages to be loaded and requires a generous amount of RAM.

ogenstad commented 7 months ago

Depending on which environment Nornir is running in I think it could be relevant to have a way to load the initial nodes in an asyncio loop. I.e., if Nornir is started from an environment that is already running within asyncio you wouldn't want to have a blocking call even if it's just to gather a couple of hostnames.

However given how Nornir works I don't see this as an issue at all. The only current "blocker" is that the InitNornir function does two synchrounous requests now:

    return Nornir(
        inventory=load_inventory(config),
        runner=load_runner(config),
        config=config,
        data=data,
    )

If someone wants the initial loading of the inventory to be async if you be very easy to just have these kind of helper functions in your own code, even if they aren't part of the core.

itdependsnetworks commented 7 months ago

@dbarrosop came to a similar conclusion with ansible and documented here https://blog.networktocode.com/post/nautobot-ansible-variable-management-at-scale/

ktbyers commented 7 months ago

A remaining question though is, if you are loading a large number of devices from inventory (say >1000) even with minimal API calls per device, is it worth adding concurrency for this in Nornir.

Or is this a case where people would always just solve it on their own (with some sort of a custom inventory plugin) or just not really a problem?

itdependsnetworks commented 7 months ago

I guess you could go as wide as the threads you have on your api, but seems just as likely to cause issues than resolve, for when you fill up all of the threads and potentially (perhaps not likely?) overload the sql server.

ktbyers commented 7 months ago

@ogenstad Wouldn't the load() call inside of the load_inventory() would also be blocking i.e. for API calls this would be sync http API calls? And wouldn't this force you to build your own inventory plugin to workaround this?

For example, for the NetBox plugin:

https://github.com/wvandeun/nornir_netbox/blob/develop/nornir_netbox/plugins/inventory/netbox.py#L111

dgarros commented 7 months ago

I'm 100% for having a minimal inventory but I still thing there is a valid use case for having async support for the inventory. Assuming we want to pull 5000 devices from Netbox/Nautobot, it means you would most likely have to make 5 API calls because of the pagination. With Async it would be really easy to make these 5 calls concurrently and it will reduce the execution time of the inventory by as much without adding much complexity.

I think the argument about having too many concurrents calls that will overload the API isn't really the point here, it's easy to limit that in the code

Also as @ogenstad mentioned when you are running an async code, it's best if all I/O functions support async. so if the goal is to run Nornir Task with Async it make a lot of sense to have the inventory support Async too

One more thing to consider, I think it make sense to use a single library/client/plugin within Nornir to connect to a SOT system like Netbox/Nautobot. So if we agree that it make sense to use Async for the task, it wouldn't make sense to use a different library/client/plugin for the inventory.

dgarros commented 7 months ago

@ogenstad Wouldn't the load() call inside of the load_inventory() would also be blocking i.e. for API calls this would be sync http API calls? And wouldn't this force you to build your own inventory plugin to workaround this?

For example, for the NetBox plugin:

https://github.com/wvandeun/nornir_netbox/blob/develop/nornir_netbox/plugins/inventory/netbox.py#L111

Yes this would need to support async as well

dgarros commented 7 months ago

I opened PR #882 with a proposal to support both Sync and Async inventory with very minimal changes. I hope it will help with this discussion.

ogenstad commented 7 months ago

@ogenstad Wouldn't the load() call inside of the load_inventory() would also be blocking i.e. for API calls this would be sync http API calls? And wouldn't this force you to build your own inventory plugin to workaround this?

For example, for the NetBox plugin:

https://github.com/wvandeun/nornir_netbox/blob/develop/nornir_netbox/plugins/inventory/netbox.py#L111

Sure, there would be a bit more boiler plate, though the Nornir object itself only cares about the Inventory data it gets from InitNornir and the loader. What I meant was that it could be loaded directly into the object.

ktbyers commented 7 months ago

@dgarros Can you re-state this? I didn't follow what you were saying here?

One more thing to consider, I think it make sense to use a single library/client/plugin within Nornir to connect to
a SOT system like Netbox/Nautobot. So if we agree that it make sense to use Async for the task, it wouldn't make 
sense to use a different library/client/plugin for the inventory.
dgarros commented 7 months ago

@ktbyers if I take our SDK and our nornir plugin as an example We have 2 versions of the client, a Sync and an Async one, let's call them Client and ClientAsync When we query the inventory, each client will return a list of Device objects with and without Async support,

So if I want to use Async in the Tasks, it wouldn't make sense to use the non async client for the inventory because it would create the wrong objects, without Async support.

Also the client itself is usually passed from the inventory to the tasks so it would have to be re-initialized later in the code.

itdependsnetworks commented 7 months ago

I think the argument about having too many concurrents calls that will overload the API

Apologies, it was not meant to be an argument, just a point of consideration.

ktbyers commented 7 months ago

@itdependsnetworks It is all good...we are just trying to see the pros and cons of various options.

dbarrosop commented 7 months ago

I am convinced now there is value in async plugins if anything due to pagination (even though IMO inventory plugins should come with some pre-filtering capabilities so you can download only devices you may be interested in). However, this raises the following question; why support both synch/asynch? For instance, the following comment from @dgarros:

So if I want to use Async in the Tasks, it wouldn't make sense to use the non async client for the inventory because it would create the wrong objects, without Async support.

This comment basically suggests that nornir all of a sudden is going to become a fragmented framework where you need to carefully combine your inventories and plugins to make sure they match and can work together. Or did I misunderstand the comment? Assuming I didn't, why are we trying to support both together, then? Isn't this going to be confusing and/or lead to a lot of complexity within nornir? Why not consider releasing nornir 4 as a fully async framework exclussively in that case? I am personally not a fan of having nornir support a disjoint set of plugins that aren't compatible with each other.

dgarros commented 7 months ago

Unfortunately I think the industry is not ready to completely migrate to Async and a lot of people are using Nornir without Async successfully so I think it makes sense to support both for the time being. Hopefully as the ecosystem of plugins supporting Async grows more users will migrate and hopefully someday it won't be necessary to support both.

dbarrosop commented 7 months ago

That's fine, but then plugins should be compatible with each other. I don't think nornir users should be put in a position where they need to pick and match plugins and try to understand if they are compatible with each other. This is just going to be bad UX and cause problems to maintainers as people try to figure out why things don't work.

To clarify, what I mean here is that a user should be able to use CoolDgarrosAsyncInventory and then call netmiko_send_command and, viceversa, use YamlInventory and then call amazing_async_task.

dbarrosop commented 7 months ago

Maybe we need to take a step back and decide goals and non-goals and how the ecosystem is going to work in this brave new world before we start discussing architecture/implementation details.

ktbyers commented 7 months ago

I will create a set of goals/non-goals in #881.

ktbyers commented 7 months ago

I did a very rough inventory proof-of-concept here:

https://github.com/ktbyers/nornir/pull/1/files https://github.com/ktbyers/nornir_netbox/pull/1/files

This is with a separate InitNornirAsync and a separate load_async method (in the plugin and in InitNornirAsync).

I didn't really worry about code duplication or making all of the parts async (i.e. was focussing on the NetBox http calls and converting them to async).

I used NetBox as I have a new version of Netbox in my environment (so it was easier for me to test).

I also wanted to get a better sense of some of the async code implications (on the inventory side).

dbarrosop commented 7 months ago

Looking at your example I am assuming my interpretation of the comment below was wrong:

So if I want to use Async in the Tasks, it wouldn't make sense to use the non async client for the inventory because it would create the wrong objects, without Async support.

I assumed the comment meant that the expectation was that an async inventory plugin wouldn't work with a sync task but your load_async returns a regular Inventory so current sync tasks should work just fine.

ktbyers commented 7 months ago

Yes, I would definitely want the Nornir inventory objects (Hosts, Groups, Defaults) to not care about their load mechanism (i.e. sync vs async).

I think we could accomplish both though (i.e. add a field to the Host objects indicating whether they were loaded via async or not) and then some form of a config option (or potentially logic from the Nornir end-user) disallowing any use of sync tasks (for async inventory objects).