Open aneuway2 opened 4 years ago
@ianlini lmk if theres anything you think should be changed on this or would need to be considered...otherwise this should be good to merge :-)
Thanks for your contribution. Your idea is cool but the word "hook" is too ambiguous here.
Correct me if I am wrong. We originally support customized key generating method (reducer and splitter), but the values cannot be changed. Therefore, you are proposing this hook for changing the values.
Right exactly. What are your thoughts on adding this to the splitter/reducer functions directly then @ianlini ?
def reducer(key, value):
I used hook
as I was recently looking at httpx and they use a similar naming but im not tied to this and Im open to other suggestions if we leave as is.
I think this is not related to the flatten or unflatten process, but might be good to have for saving memory usage or computing time. We can do all the things we want in one dict creation process instead of having some intermediate dict and doing some post processing.
You've hit the nail on the head with saving memory usage. I'd prefer to do value processing as we do the packing and unpacking to save usage there.
If this is the purpose, and think there are some other ways that are more flexible and pythonic.
For example, we can make a generator
Flatten.items()
:{key: hook(key, value) for key, value in Flatten(d).items()}
I think this will make the code much more readable. I'm actually planning to make a Flatten class like this, so please tell me your suggestion.
I think the approach you've suggested above would add an additional level of processing, right? (since we'd create the items and then have to process them again with the hook)
My use case looks like this and is like vlookups (in excel) for api calls
And I have a working class that looks something like this (simplified for clarity):
memory = {}
class JobProcessor:
def __init__(self, flat_dict):
self.completed_job = False
self.has_lookup_value = False
self.job = unflatten(flat_dict, splitter=self._splitter, hook=self.hook)
self.info = self.job.get("dbg_info", {})
def _splitter(self, flat_key):
flat_key = re.sub(r"\[([0-9]+)\]", r".\g<1>", flat_key) # dict[0] --> dict.0
return flat_key.split(".")
def _reducer(self, k1, k2):
if k1 is None:
return k2
elif isinstance(k2, int) or k2.isdigit():
return f"{k1}[{k2}]"
else:
return f"{k1}.{k2}"
def hook(self, value, key):
if not self.completed_job:
matches_replaced = []
rule = r"(some regex here)"
search = re.findall(rule, value)
for match in search:
lookup_key = "(follows regex match groups/patterns which has the flat key in it)"
value_in_memory = memory.get(lookup_key, {})
if value_in_memory:
value = re.sub(rule, value_in_memory, value, count=1)
matches_replaced.append(True)
else:
matches_replaced.append(False)
if False in matches_replaced:
self.has_lookup_value = True # we skip this job until all values replaced (i.e not matching the regex)
if self.completed_job:
lookup_key = "(follows regex match groups/patterns that uses the key)"
memory[lookup_key] = value
I think the approach you've suggested above would add an additional level of processing, right? (since we'd create the items and then have to process them again with the hook)
In your code, you still need to create the key and value. Yielding items should be very fast, so I think it won't be the bottleneck in most cases.
Right exactly. What are your thoughts on adding this to the splitter/reducer functions directly then @ianlini ? def reducer(key, value): I used hook as I was recently looking at httpx and they use a similar naming but im not tied to this and Im open to other suggestions if we leave as is.
Combining with splitter/reducer will be more complicated. Most of the time, people don't need this function, so they don't want to return the identical value. It's hard to make one of the return value optional.
It's also hard to explain how hook can be used because it is not abstract enough. People need to know a lot of implementation details to write the code correctly. For example, if I have a dict
d = {"a": {"b": {"c": "value}}}
People need to know that hook will be called like this:
hook(("c",), "value")
hook(("b", "c"), "value")
hook(("a", "b", "c"), "value")
If I want to hook something when the key is ("b", "c")
with at least one preceding keys, I will write a hook function that will do something when the key is ("b", "c")
with exactly one preceding key.
Other design issues:
flatten
to reduce ("a", "b")
first, and then ("a", "b", "c")
, it will affect the definition of hook.
hook
Callable now added to flatten and unflatten