ianlini / flatten-dict

A flexible utility for flattening and unflattening dict-like objects in Python.
MIT License
176 stars 36 forks source link

Adds Hook functionality #33

Open aneuway2 opened 4 years ago

aneuway2 commented 4 years ago
aneuway2 commented 4 years ago

@ianlini lmk if theres anything you think should be changed on this or would need to be considered...otherwise this should be good to merge :-)

aneuway2 commented 4 years ago

Thanks for your contribution. Your idea is cool but the word "hook" is too ambiguous here.

Correct me if I am wrong. We originally support customized key generating method (reducer and splitter), but the values cannot be changed. Therefore, you are proposing this hook for changing the values.

Right exactly. What are your thoughts on adding this to the splitter/reducer functions directly then @ianlini ?

def reducer(key, value):

I used hook as I was recently looking at httpx and they use a similar naming but im not tied to this and Im open to other suggestions if we leave as is.

I think this is not related to the flatten or unflatten process, but might be good to have for saving memory usage or computing time. We can do all the things we want in one dict creation process instead of having some intermediate dict and doing some post processing.

You've hit the nail on the head with saving memory usage. I'd prefer to do value processing as we do the packing and unpacking to save usage there.

If this is the purpose, and think there are some other ways that are more flexible and pythonic.

For example, we can make a generator Flatten.items():

{key: hook(key, value) for key, value in Flatten(d).items()}

I think this will make the code much more readable. I'm actually planning to make a Flatten class like this, so please tell me your suggestion.

I think the approach you've suggested above would add an additional level of processing, right? (since we'd create the items and then have to process them again with the hook)

Context on my current use case

My use case looks like this and is like vlookups (in excel) for api calls

  1. input flat dict
  2. if there are values that match some regex, look for an existing value in a separate "memory" dict and replace it with that value
  3. Run API calls with the unflattened dict
  4. Add memory values to the dict from API calls that come back
  5. Repeat steps 2-4 until there arent any values to look for any more

And I have a working class that looks something like this (simplified for clarity):

memory = {}
class JobProcessor:
    def __init__(self, flat_dict):
        self.completed_job = False
        self.has_lookup_value = False
        self.job = unflatten(flat_dict, splitter=self._splitter, hook=self.hook)
        self.info = self.job.get("dbg_info", {})

    def _splitter(self, flat_key):
        flat_key = re.sub(r"\[([0-9]+)\]", r".\g<1>", flat_key)  # dict[0] --> dict.0
        return flat_key.split(".")

    def _reducer(self, k1, k2):
        if k1 is None:
            return k2
        elif isinstance(k2, int) or k2.isdigit():
            return f"{k1}[{k2}]"
        else:
            return f"{k1}.{k2}"

    def hook(self, value, key):

        if not self.completed_job:
            matches_replaced = []
            rule = r"(some regex here)"
            search = re.findall(rule, value)
            for match in search:
                lookup_key = "(follows regex match groups/patterns which has the flat key in it)"
                value_in_memory = memory.get(lookup_key, {})
                if value_in_memory:
                    value = re.sub(rule, value_in_memory, value, count=1)
                    matches_replaced.append(True)
                else:
                    matches_replaced.append(False)
            if False in matches_replaced:
                self.has_lookup_value = True  # we skip this job until all values replaced (i.e not matching the regex)
        if self.completed_job:
            lookup_key = "(follows regex match groups/patterns that uses the key)"
            memory[lookup_key] = value
ianlini commented 4 years ago

I think the approach you've suggested above would add an additional level of processing, right? (since we'd create the items and then have to process them again with the hook)

In your code, you still need to create the key and value. Yielding items should be very fast, so I think it won't be the bottleneck in most cases.

ianlini commented 4 years ago

Right exactly. What are your thoughts on adding this to the splitter/reducer functions directly then @ianlini ? def reducer(key, value): I used hook as I was recently looking at httpx and they use a similar naming but im not tied to this and Im open to other suggestions if we leave as is.

Combining with splitter/reducer will be more complicated. Most of the time, people don't need this function, so they don't want to return the identical value. It's hard to make one of the return value optional.

It's also hard to explain how hook can be used because it is not abstract enough. People need to know a lot of implementation details to write the code correctly. For example, if I have a dict

d = {"a": {"b": {"c": "value}}}

People need to know that hook will be called like this:

hook(("c",), "value")
hook(("b", "c"), "value")
hook(("a", "b", "c"), "value")

If I want to hook something when the key is ("b", "c") with at least one preceding keys, I will write a hook function that will do something when the key is ("b", "c") with exactly one preceding key.

Other design issues:

  1. It's non-necessary to call hook so many times
  2. It's not intuitive to define the keys that we should do something at. I would rather use something like jsonpath to do this.
  3. If someday I change my implementation of flatten to reduce ("a", "b") first, and then ("a", "b", "c"), it will affect the definition of hook.