landlock-lsm / linux

Linux kernel - See Landlock issues
https://git.kernel.org/pub/scm/linux/kernel/git/mic/linux.git/
Other
33 stars 9 forks source link

Domain hash table #1

Open l0kod opened 5 months ago

l0kod commented 5 months ago

To make it simpler, a Landlock domains is currently a landlock_ruleset struct. The use of this data structure includes fields which are useless for a domain, and a red-black tree which is not useful nor optimized for a read-only data structure.

To both improve performance and test coverage (by removing some WARN_ON_ONCE checks), we should create a dedicated domain data type, mainly consisting of a hash table. We should be able to use hash_long().

We should first create a tool to benchmark domain use and measure the performance improvement related to this new data type, see #24.

gnoack commented 2 months ago

Bitwise lookup tables are potentially also an option for some more advanced features: https://lore.kernel.org/all/ZhRKOTmoAOuwkujB@google.com/

gnoack commented 2 months ago

I have been wondering about the design of this in the past as well; the existing approach seems focused on keeping the worst case behaviour in check, but most Landlock rulesets that I have seen so far are actually pretty simple -- it makes me wonder whether we should not also take average case performance into account separately from the worst case performance.

In my mind a more "obvious" implementation approach for these domains would be to create one kernel-internal ruleset structure with these RB-trees, and just link it to its parent ruleset instead of merging this.

As rulesets are always created in a nested fashion, this will form a tree of rulesets and we can use reference counting for memory management.

I suspect that this might work well in the average case, because as far as I have seen, in realistic scenarios, the stacked rulesets seldomly refer to the same inodes. So if during a merge operation, these inode entries don't actually collapse across the merged rulesets, we just end up with multiple RB-trees that have the same overall number of entries as the big one that we are merging together in the existing implementation.

If we move to hash tables instead of RB-trees, linear scanning can be a valid performance optimization for small hash table sizes.

There are too many variables here to reason about it with confidence. I have a hard time proving that approach we have fixed bug 24 :)

Has this or a similar approach been discussed before for Landlock, maybe in one of the earlier patch sets?

l0kod commented 2 months ago

Bitwise lookup tables are potentially also an option for some more advanced features: https://lore.kernel.org/all/ZhRKOTmoAOuwkujB@google.com/

Indeed

I have been wondering about the design of this in the past as well; the existing approach seems focused on keeping the worst case behaviour in check, but most Landlock rulesets that I have seen so far are actually pretty simple -- it makes me wonder whether we should not also take average case performance into account separately from the worst case performance.

Sure, any idea?

In my mind a more "obvious" implementation approach for these domains would be to create one kernel-internal ruleset structure with these RB-trees, and just link it to its parent ruleset instead of merging this.

As rulesets are always created in a nested fashion, this will form a tree of rulesets and we can use reference counting for memory management.

I'm not sure to follow.

My reasoning to merge rulesets was to get a good locality for rules of the same domain. It takes more memory (by duplicating data) but I think it is not a big issue because rulesets are usually not so big and quick lookup is more valuable.

The landlock_hierarchy tree is use to keep track of domain hierarchies.

Has this or a similar approach been discussed before for Landlock, maybe in one of the earlier patch sets?

Not much. It was just mentioned that a hash table might be a good idea.

sm1ling-knight commented 2 months ago

@l0kod, what do you think about removing all layer-related logic for rules with non-hierarchical keys? That is, layers are currently required only to check access for some path (when we do pathwalk through all parent inodes, collecting access bits for each layer). For example, there is no need to keep layers for net rules, we can keep only one actions bitmask for each port.

l0kod commented 2 months ago

@l0kod, what do you think about removing all layer-related logic for rules with non-hierarchical keys? That is, layers are currently required only to check access for some path (when we do pathwalk through all parent inodes, collecting access bits for each layer). For example, there is no need to keep layers for net rules, we can keep only one actions bitmask for each port.

This is indeed not required to check an network port access but we need to store access rights per layer to be identify the layer that denied an access request. This is required for audit support to include in the logs the layer level that denied a request.