volatilityfoundation / volatility3

Volatility 3.0 development
http://volatilityfoundation.org/
Other
2.61k stars 447 forks source link

Duplicate Enum value in `bpf_map_type` #1060

Closed vobst closed 10 months ago

vobst commented 10 months ago

Commit bpf: Implement cgroup storage available to non-cgroup-attached bpf progs merged in Linux 6.2-rc1 introduced a duplicate enum value in enum bpf_map_type.

Currently Volatility throws an exception in that case, could we maybe handle it more gracefully, e.g., by making the mapping int -> str | list[str] or int -> list[str] ?

The relevant code seems to be here and it looks like the problem was already anticipated ;)

    @classmethod
    def _generate_inverse_choices(cls, choices: Dict[str, int]) -> Dict[int, str]:
        """Generates the inverse choices for the object."""
        inverse_choices: Dict[int, str] = {}
        for k, v in choices.items():
            if v in inverse_choices:
                # Technically this shouldn't be a problem, but since we inverse cache
                # and can't map one value to two possibilities we throw an exception during build
                # We can remove/work around this if it proves a common issue
                raise ValueError(
                    f"Enumeration value {v} duplicated as {k} and {inverse_choices[v]}"
                )
            inverse_choices[v] = k
        return inverse_choices

https://github.com/volatilityfoundation/volatility3/blob/713b5f96dd218eea9c3fdb490a2436a474d22352/volatility3/framework/objects/__init__.py#L599C19-L599C19

vobst commented 10 months ago

As I see it, there are four possible solutions here:

  1. leave everything as it is
  2. leave the function signature the same, but either silently replace the preimage, i.e., remove the if, or stick with the first preimage, i.e., if and continue
  3. change the function signature to dict[int, str | list[str]]
  4. change the function signature to dict[int, list[str]]

My thoughts on those are:

  1. not rly an option since important enums in the kernel are evidently not injective
  2. also not optimal since it could potentially mean that the inverse mapping depends on the order in which items were inserted in the dict (in 3.7+ dictionary iteration order is guaranteed to be in order of insertion)
  3. would be a pain for callers since they have to check what they got back, and it will be a string for 99.9% of enums in the kernel
  4. a bit less of a pain but still makes caller code look awkward to handle 0.1% of cases

To me option 2 seems to be the lesser evil as those strings are equivalent from C's perspective so it shouldn't matter which one is returned. However, I would keep the if an output a debug log message so people can figure out what is going on.

However, I've got no clue about the context of the larger code base so I might be missing something here. Happy to hear some feedback :)

ikelos commented 10 months ago

So the point of the function is to generate a map, so that if we have a value, we can look-up what it means. That appears to be the only place it's used, and it is private, so the only public expose is through the lookup function. However, it feels problematic to change the API for what is a so-far, extremely uncommon situation, so I'd prefer to rule out lookup potentially returning a list (it would have to always return a list, handing back different types creates an unstable API and leads to way more problems than just a break on a once-in-a-blue-moon event).

So the real options open to us are:

I'm kind of ok with any of them. I tend to err on the side of barfing rather than deciding things for the user, but that may work out to be extremely inconvenient, in which case my second choice would just be to ignore duplicates and debug log for the user. I'll mock up a PR that would make the change...